Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Are Evals Dead?

25:24
 
Share
 

Manage episode 508585151 series 3241972
Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

AI Conversations Powered by Prosus Group

Your AI agent isn’t failing because it’s dumb—it’s failing because you refuse to test it. Chiara Caratelli cuts through the hype to show why evaluations—not bigger models or fancier prompts—decide whether agents succeed in the real world. If you’re not stress-testing, simulating, and iterating on failures, you’re not building AI—you’re shipping experiments disguised as products.

Guest speaker: Chiara Caratelli - Data Scientist @ Prosus Group

Host: Demetrios Brinkmann - Founder of MLOps Community

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

Sign up for the next meetup: [https://go.mlops.community/register]

MLOps Swag/Merch: [https://shop.mlops.community/]

  continue reading

473 episodes

Artwork

Are Evals Dead?

MLOps.community

56 subscribers

published

iconShare
 
Manage episode 508585151 series 3241972
Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

AI Conversations Powered by Prosus Group

Your AI agent isn’t failing because it’s dumb—it’s failing because you refuse to test it. Chiara Caratelli cuts through the hype to show why evaluations—not bigger models or fancier prompts—decide whether agents succeed in the real world. If you’re not stress-testing, simulating, and iterating on failures, you’re not building AI—you’re shipping experiments disguised as products.

Guest speaker: Chiara Caratelli - Data Scientist @ Prosus Group

Host: Demetrios Brinkmann - Founder of MLOps Community

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

Sign up for the next meetup: [https://go.mlops.community/register]

MLOps Swag/Merch: [https://shop.mlops.community/]

  continue reading

473 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play