Best Llm Evaluation Podcasts (2025)

1
Season 6, Episode 16: Can an LLM evaluate ad creative? (with Luca Fiaschi) 47:07

Play Pause

3d ago47:07

47:07

My guest on this episode of the podcast is Luca Fiaschi, a machine learning expert who previously held executive data science roles at MistPlay, StitchFix, and HelloFresh. Luca is now a Partner for the Generative AI vertical at PyMC Labs, a consultancy that specializes in the application of Bayesian methods to business problems and which maintains …

1
EP 628: What’s the best LLM for your team? 7 Steps to evaluate and create ROI for AI 38:52

30d ago38:52

38:52

How can you measure ROI on GenAI for your team? 🤔 Internal evaluations and intentionality. We've helped thousands of orgs put LLMs to work and ACTUALLY save time. On today's show, we're dishing the 7 steps you need to follow. What’s the best LLM for your team? 7 Steps to evaluate and create ROI for AI -- An Everyday AI chat with Jordan Wilson Newsl…

1
Ep 649: The 7 Types of AI Agents and the 10 Top Agents for Businesses to Grow 50:50

41m ago50:50

50:50

There's like a bajillion AI agents. 🤖 But most of the REAL agents fall into these 7 categories that you need to understand. Oh.... and don't worry. We'll break down the top 10 AI Agents for business growth. Join us as we go over Agents 101, the 7 categories of AI agents, and the 10 you should be paying most attention to. Ep 649: The 7 Types of AI A…

1
Ep 648: How 74% of Enterprises Get Real AI ROI While Pundits See Failure 38:11

23m ago38:11

38:11

Is it AI failure or AI success? 🤔 We see massive trillion dollar valuations for AI companies, yet constant ‘AI bubble’ bust stories. And we see stories playing out in the media that say AI is both an enterprise boon and a complete waste of time. Welp…. A new study from Wharton will hopefully put this to rest. Among other things, it shows that 74% o…

1
Ep 647: The New Secret Google Gemini Feature that Quietly Kills Powerpoint slides 31:37

23h ago31:37

31:37

Three words to Google Gemini and you can kiss your PowerPoint woes goodbye. 👋 Google Gemini quietly rolled out a kinda secret feature that TBH was deserving of a keynote. So how do you create slides in Google Gemini? And what are the pros and the limitations? Tune in as we put AI to Work on Wednesdays. The New Secret Google Gemini Feature that Quie…

1
Building an AI Mathematician with Carina Hong - #754 55:52

4d ago55:52

55:52

In this episode, Carina Hong, founder and CEO of Axiom, joins us to discuss her work building an "AI Mathematician." Carina explains why this is a pivotal moment for AI in mathematics, citing a convergence of three key areas: the advanced reasoning capabilities of modern LLMs, the rise of formal proof languages like Lean, and breakthroughs in code …

1
Ep 646: OpenAI: How can A former nonprofit losing $12 billion a quarter go public at $1 trillion? 49:27

2d ago49:27

49:27

OpenAI: reportedly losing $12 billion a quarter. 🥵 Also OpenAI: reportedly going public at a $1 trillion market cap. 🤑 The math aint mathin, right? Or is it? Join Everyday AI for our Hot Take Tuesday breaking down OpenAI's new structure, its new deal with Microsoft, and whether they're likely to keep losing money or be Wall Street's next darling. N…

1
Claude, TypingMind, AMP & MCP Servers: The Future Dev 53:24

4d ago53:24

53:24

How do you give an agent the same visibility a human developer has, without giving it full control? Alan Pope, Senior Developer Advocate at Tessl, explains how Model Context Protocols (MCPs) give AI agents structured access to dev environments, enabling tools like Claude Code and TypingMind to read, build, and execute safely under human oversight. …

1
EP 645: OpenAI’s $1 trillion IPO, Google Gemini closing ground on ChatGPT, Microsoft’s AI app builder & more AI News That Matters 38:21

6d ago38:21

38:21

Big AI deals. 🤝 Titans chasing startups. 🏃 Vibe coding with never-before-seen ease and enterprise AI following you to the apps you use every day. 🪄 As always, we saw some huge AI moves this week. How big? A (potential) $1 trillion IPO, hundreds of millions of users and 30K jobs cut. If you missed all the AI movement, we'll get you caught up and hel…

1
#744: Amazon Bedrock AgentCore, Amazon EC2 Capacity Manager, and so much more! 31:36

4d ago31:36

31:36

Simon and Jillian walk you through all the new and interesting updates.By Amazon Web Services

1
If You Can’t Test It, Don’t Deploy It: The New Rule of AI Development? 22:50

3d ago22:50

22:50

Magdalena Picariello reframes how we think about AI, moving the conversation from algorithms and metrics to business impact and outcomes. She champions evaluation systems that don't just measure accuracy but also demonstrate real-world business value, and advocates for iterative development with continuous feedback to build optimal applications.Rea…

1
Season 6, Episode 15: MDM Mailbag #6 (with Sylvain Gauchet) 49:08

6d ago49:08

49:08

The sixth installment of the Mobile Dev Memo mailbag features app monetization expert Sylvain Gauchet. Sylvain formerly served as Babbel's US Director of Revenue Strategy and now works with a number of subscription apps on revenue growth as an advisor and fractional executive. Additionally, Sylvain runs the GrowthGems newsletter, for which he scour…

1
Ep 644 5 Underrated ChatGPT Features You Should Be Using But Aren’t (Replay) 33:41

6d ago33:41

33:41

ChatGPT Agents and Atlas have taken all the spotlight. 🤖 But these 5 underrated ChatGPT features can instantly improve your results. Join us as we uncover them and give you a leg up on everyone else. 5 Underrated ChatGPT Features You Should Be Using But Aren’t -- An Everyday AI Chat with Jordan Wilson (Replay) Newsletter: Sign up for our free daily…

1
Ep 643: Amazon Cuts 30,000 jobs in AI push. What this means for the the U.S. economy. 40:24

7d ago40:24

40:24

In an AI push, Amazon has already axed 14,000 jobs and that total is reportedly going to hit 30,000. 🪓 Is this because Amazon overhired during the pandemic? Or, is this a sign that AI is now capable enough that most enterprises will cut thousands of jobs. Tune in as we discuss. Amazon Cuts 30,000 jobs in AI push. What this means for the the U.S. ec…

1
Ep 642: Most Slept On Claude Feature? Simplest Way To Create Files In An AI Chat 39:47

8d ago39:47

39:47

One small but fatal flaw of most LLMs? 💩 All your insights and deliverables kinda sit and die in those deserted chats. It can be tricky or nearly impossible to have ai chatbots simply create file types consistently. That's changing with this ONE overlooked feature inside Anthropic's Claude. Tune in as we put AI to Work on Wednesdays and start savin…

1
High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753 52:23

11d ago52:23

52:23

In this episode, Hung Bui, Technology Vice President at Qualcomm, joins us to explore the latest high-efficiency techniques for running generative AI, particularly diffusion models, on-device. We dive deep into the technical challenges of deploying these models, which are powerful but computationally expensive due to their iterative sampling proces…

1
AI Agents Beyond Context Limits | Maksim Shaposhnikov 58:26

9d ago58:26

58:26

Bots follow scripts. Assistants wait for your commands. Agents act autonomously. Maksim Shaposhnikov, AI Research Engineer at Tessl, joins Simon Maple to unpack the capabilities of AI coding agents, including how developers can test and trust the code they generate. On the docket: • how sub-agents operate independently, maintaining their own contex…

1
Ep 641: ChatGPT Ads: 9 Reasons why personalized ads are coming to ChatGPT soon 44:31

9d ago44:31

44:31

ChatGPT ads are coming. 📰 They’re gonna be both crazy intrusive yet also pretty useful. That’s a given. But the real hot take here: personalized ChatGPT ads are actually gonna change how the internet works and conversational commerce is going to be the new norm. Every single company — including yours — is going to have to quickly adapt. We lay out …

1
EP 640: OpenAI’s new agentic browser, Microsoft releases dozens of AI features, Meta slashes hundreds of AI jobs and more AI news 36:33

10d ago36:33

36:33

Apparently this week was the week of Agentic Browsers? 🤷‍♂️ But, OpenAI's Atlas might not even be a top 3 AI news story of the week. We had hundreds of AI job cuts at Meta, Microsoft unveiled dozens of new AI features and AI music giant Suno may have a surprise competitor. Get caught up and get ahead with Everyday AI's weekly AI News That Matters s…

1
#743: The Frugal Architect w/ Werner Vogels: The Ocean Cleanup's mantra: Start simple and iterate relentlessly 40:45

10d ago40:45

40:45

When Boyan Slat found more plastic than fish on a dive in Greece, he asked a simple question: "Why can't we just clean this up?" He was 16.What began as a humble project funded with pocket money has grown into a global initiative, removing millions of pounds of plastic from the world's rivers and oceans in the last decade. But simple questions don'…

1
Effective Error Handling: A Uniform Strategy for Heterogeneous Distributed Systems 37:47

10d ago37:47

37:47

Jenish Shah, a back-end engineer focused on distributed systems at Netflix, provides more insights on how to handle failures in a distributed systems setup. He shares details on how he built a library that handles exceptions uniformly, regardless of the underlying communication protocol. Read a transcript of this interview: http://bit.ly/3JpmIBnSub…

1
Ep 639: Microsoft’s surprise AI updates: 5 categories of new AI tools and features 35:17

13d ago35:17

35:17

Blink and you’ve missed a few dozen Microsoft AI updates. And obviously agentic browser updates in Edge. If you missed Microsoft’s Copilot Sessions Fall Update, then you might be stuck scratching your head trying to decipher AI updates like that one street sign that no one understands. Don’t worry. We did the homework for you. Join us as we break d…

1
Beyond Accuracy: Evaluating the learned representations of Generative AI models | Aida Nematzadeh 53:17

14d ago53:17

53:17

Dr. Aida Nematzadeh is a Senior Staff Research Scientist at Google DeepMind where her research focused on multimodal AI models. She works on developing evaluation methods and analyze model’s learning abilities to detect failure modes and guide improvements. Before joining DeepMind, she was a postdoctoral researcher at UC Berkeley and completed her …

1
Ep 638: Agentic Browser Showdown. ChatGPT Atlas vs. Perplexity Comet 43:11

14d ago43:11

43:11

(Kinda) Hot take 🔥 AI agents kinda stink. (For now.) If you want to get more done with AI, ditch the “general” agents until they catch up. Want gains today? Agentic browsers are the real winners. (Like OpenAI's just released Atlas browser.) Agentic Browsers are powered by the world's smartest models and actually keep your context and finish multi-s…

1
Vibe Coding's Uncanny Valley with Alexandre Pesant - #752 1:12:36

17d ago1:12:36

1:12:36

Today, we're joined by Alexandre Pesant, AI lead at Lovable, who joins us to discuss the evolution and practice of vibe coding. Alex shares his take on how AI is enabling a shift in software development from typing characters to expressing intent, creating a new layer of abstraction similar to how high-level code compiles to machine code. We explor…

1
Ep 637: ChatGPT’s New Agentic browser: Hands on with OpenAI’s Atlas 46:06

15d ago46:06

46:06

ChatGPT just released their agnetic browser, Atlas. 🌏 Will it kill Chrome? What does it do? How does it incorporate ChatGPT? We'll answer those questions and more on today's show. ChatGPT’s New Agentic browser: Hands on with OpenAI’s Atlas -- An Everyday AI Chat with Jordan Wilson Newsletter: Sign up for our free daily newsletter More on this Episo…

1
Cloud and DevOps InfoQ Trends Report 2025 50:46

16d ago50:46

50:46

In this episode of the podcast, members of the InfoQ editorial staff and friends of InfoQ will discuss current trends in the cloud and DevOps domains as part of our annual trends report creation process. These reports provide InfoQ readers with a high-level overview of key topics to watch and also help the editorial team focus on innovative technol…

1
Season 6, Episode 14: Zero-to-one product growth (with Daphne Tideman) 49:49

16d ago49:49

49:49

In this week's episode of the podcast, I speak with Daphne Tideman, a product growth expert who runs the Growth Waves newsletter. The topic of our conversation is "zero-to-one growth": the tactics developers can utilize to validate and optimize their product to ultimately enable scaled user acquisition. Among other things, we cover: The purpose of …

1
Ep 636: Uber paying drivers $1 to train AI models? A sign of what’s next 35:46

16d ago35:46

35:46

Uber is paying its drivers as little as $1 to train LLMs. 😯 Smart business move or eery sign of what's to come? On this Hot Take Tuesday episode, we uncover the trend of dirt cheap data labeling, why it's a good thing and a bad thing, and how this is actually a sign of what's next. Uber paying drivers $1 to train AI models? A sign of what’s next --…

1
Instant PR Feedback Without leaving GitHub | Merrill Lutsky on Graphite 47:00

16d ago47:00

47:00

As AI outpaces human review, latency compounds. On AI Native Dev, Graphite co-founder and CEO, Merrill Lutsky joins Guy Podjarny to explore how stack aware reviews remove friction and accelerate AI-native development. They also get into: • how Graphite’s architecture ensures traceability across AI generated commits • what engineering velocity means…

1
Ep 635: ChatGPT & Wal-Mart team up for AI shopping, Google drops Veo 3.1, Claude Skills get released & more 35:07

17d ago35:07

35:07

Could the combo of ChatGPT and Wal-Mart take on Amazon? 🥊 What are Claude Skill and why do they matter? 🤔 Did Sora 2 already get dethroned by Veo 3.1? 📹 A lot happened in the AI world this week. We'll break it down so you don't have any questions. ChatGPT and Wal-Mart team up for AI shopping, Google drops Veo 3.1, Claude Skills get released and mor…

1
#742: Amazon Quick Suite, AWS MCP, and lots more! 30:35

17d ago30:35

30:35

Stay up-to-date, with almost 60 new updates this week!!By Amazon Web Services

1
Ep 634: AI Hype Is Over. Here’s What Your Business Should Do Next (Replay) 45:05

20d ago45:05

45:05

AI hype is well over.🥱 Using AI is no longer a competitive edg y'all. it’s as basic as having internet access. (You wouldn't put that on your marketing materials, would ya?) If your business is still treating AI as something special, or stuck trying to look innovative just by using it, this episode is for you. Find out what actually matters now and…

1
Ep 633: The 3 Big Obstacles Holding AI Adoption Back 33:04

21d ago33:04

33:04

Jeetu Patel knows a few AI secrets. As the President of one of the largest companies in the world, he's helped pave the AI adoption roadmap. At Cisco, they provide full-stack, enterprise AI solutions spanning infrastructure, security, observability, and operations to the world's largest companies. So naturally, Jeetu could write a legit playbook on…

1
Ep 632: ChatGPT Apps: 3 Hands-on approaches to save time today 43:34

24d ago43:34

43:34

You haven't used ChatGPT's Apps yet? 🫠 Oh.... you like wasting time? Even for free users, ChatGPT rolled out its new Apps mode that promises to shift the future of work. Don't know how to work it? Don't know where to start? Join us as we share 3 practical ways to start saving time today. ChatGPT Apps: 3 Hands-on approaches to save time today -- An …

1
Dataflow Computing for AI Inference with Kunle Olukotun - #751 57:37

25d ago57:37

57:37

In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss reconfigurable dataflow architectures for AI inference. Kunle explains the core idea of building computers that are dynamically configured to match th…

1
Season 6, Episode 13: MDM Mailbag #5 (with Kate Minogue) 50:58

24d ago50:58

50:58

In this week's episode of the podcast, I speak with Kate Minogue, a fractional CPO and advisor for consumer and ad tech companies. Kate also runs the AI Leadership Lab, an AI leadership course. Previously, Kate worked in marketing measurement at Meta. This episode is the fifth installment of the MDM Mailbag series, in which I bring experts onto the…

1
Ep 631: AI’s App Store Moment? Are ChatGPT’s Apps The Next Big Thing or Smoke and Mirrors? 35:58

25d ago35:58

35:58

Will this be AI's 'App Store Moment'? 🤔 OpenAI's Apps are live, and the consensus is split. Some are calling them a revolutionary step forward while others are saying it's another marketing flop. What's our hot take? Join us and find out. AI’s App Store Moment? Are ChatGPT’s Apps The Next Big Thing or Smoke and Mirrors? An Everyday AI Chat with Jor…

1
AI-First Project Management for Developers | Alex Gavrilescu on Backlog.md 46:08

25d ago46:08

46:08

Even the smartest AI agent starts as a blank slate. Alexandru Gavrilescu, creator of Backlog.md, and Simon Maple explore how to give AI the right context and specifications so it can deliver like a human teammate, and sometimes faster. On the docket: • why humans still matter for review, but AI can accelerate work beyond traditional sprints • the r…

1
EP 630: OpenAI brings Apps and Agents to ChatGPT, Google drops Gemini Enterprise, is the AI bubble here and more 38:19

25d ago38:19

38:19

OpenAI debuted the future of ChatGPT with Agents and Apps. How will that impact work? 🤖 Google dropped Gemini for Enterprise. Does that make them the top AI option for the big players? 🏢 Everyone is talking about the AI bubble. Is it real and will it burst? 🫧 If you have questions over what's happening in the world of AI news, we've got answers. Jo…

1
#741: Modernizing Edge Infrastructure: Booking.com's Journey with AWS CloudFront and Lambda@Edge 1:03:45

24d ago1:03:45

1:03:45

In this episode of the AWS Podcast, host Jillian Forde discusses the migration journey of Booking.com to AWS with Ali and Sarah. They explore the challenges faced by Booking.com , the benefits of using CloudFront and Lambda at Edge, and the importance of observability and cost optimization. The conversation also delves into chaos engineering practi…

1
Mental Models in Architecture & Societal Views of Technology: A Conversation with Nimisha Asthagiri 51:51

26d ago51:51

51:51

In this podcast, Michael Stiefel spoke with Nimisha Asthagiri about the importance of system thinking, multi-agent systems, the consequences of society applying a technology into an area for which it was not designed, and whether we can ever have a healthy relationship with artificial intelligence. System thinking emphasizes the importance of menta…

1
Ep 629: Google’s surprise release: Will Gemini Enterprise Compete with ChatGPT and Microsoft Copilot? 39:45

29d ago39:45

39:45

Breaking: Google just released Gemini Enterprise. 🚨 Will it be a ChatGPT or Microsoft Copilot killer? We got our hands on a version of the newest release and will break down everything you need to know, including the ONE feature that could ultimately set Gemini Enterprise apart. Google Gemini Enterprise: Coming for ChatGPT and Microsoft Copilot? --…

1
Ep 627: NotebookLM: New features, what’s next and complete walkthrough 39:00

1M ago39:00

39:00

Have you been sleeping on NotebookLM? 😴 If so, you're leaving hours of productivity (and probably a lot of money) at the door. But real talk -- the team is shipping fast. The NotebookLM you met last year from the viral Audio Overviews is not the NotebookLM of today. It's slowly turned into a robust, multimedia powerhouse. And the last feature updat…

1
Season 6, Episode 12: The eCommerce creative opportunity (with Dan Pantelo) 49:33

1M ago49:33

49:33

My guest on this week's episode of the podcast is Dan Pantelo, the CEO and founder of Marpipe, a platform that enables eCommerce companies to build dynamic product ads. In our conversation, we discuss: The necessity of exhaustive creative experimentation in eCommerce advertising Whether and how advertisers can create an effective feedback loop betw…

1
Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750 57:23

1M ago57:23

57:23

Today, we're joined by Jacob Buckman, co-founder and CEO of Manifest AI to discuss achieving long context in transformers. We discuss the bottlenecks of scaling context length and recent techniques to overcome them, including windowed attention, grouped query attention, and latent space attention. We explore the idea of weight-state balance and the…