Whistle While You Work is a show where Tobin Davies interviews people with inspiring and interesting careers. By listening to Tobin’s candid conversations with his guests, his listeners receive motivation and valuable lessons in subjects like leadership, business entrepreneurship, branding, innovation, lifestyle, and virtue. Want to connect? @tbdavies on twitter.
…
continue reading
Tobin Davies Podcasts

1
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Sam Charrington
Machine learning and artificial intelligence are dramatically changing the way businesses operate and people live. The TWIML AI Podcast brings the top minds and ideas from the world of ML and AI to a broad and influential community of ML/AI researchers, data scientists, engineers and tech-savvy business and IT leaders. Hosted by Sam Charrington, a sought after industry analyst, speaker, commentator and thought leader. Technologies covered include machine learning, artificial intelligence, de ...
…
continue reading

1
Autoformalization and Verifiable Superintelligence with Christian Szegedy - #745
1:11:48
1:11:48
Play later
Play later
Lists
Like
Liked
1:11:48In this episode, Christian Szegedy, Chief Scientist at Morph Labs, joins us to discuss how the application of formal mathematics and reasoning enables the creation of more robust and safer AI systems. A pioneer behind concepts like the Inception architecture and adversarial examples, Christian now focuses on autoformalization—the AI-driven process …
…
continue reading

1
Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744
1:10:20
1:10:20
Play later
Play later
Lists
Like
Liked
1:10:20Today, we're joined by Prince Canuma, an ML engineer and open-source developer focused on optimizing AI inference on Apple Silicon devices. Prince shares his journey to becoming one of the most prolific contributors to Apple’s MLX ecosystem, having published over 1,000 models and libraries that make open, multimodal AI accessible and performant on …
…
continue reading

1
Genie 3: A New Frontier for World Models with Jack Parker-Holder and Shlomi Fruchter - #743
1:01:01
1:01:01
Play later
Play later
Lists
Like
Liked
1:01:01Today, we're joined by Jack Parker-Holder and Shlomi Fruchter, researchers at Google DeepMind, to discuss the recent release of Genie 3, a model capable of generating “playable” virtual worlds. We dig into the evolution of the Genie project and review the current model’s scaled-up capabilities, including creating real-time, interactive, and high-re…
…
continue reading

1
Closing the Loop Between AI Training and Inference with Lin Qiao - #742
1:01:11
1:01:11
Play later
Play later
Lists
Like
Liked
1:01:11In this episode, we're joined by Lin Qiao, CEO and co-founder of Fireworks AI. Drawing on key lessons from her time building PyTorch, Lin shares her perspective on the modern generative AI development lifecycle. She explains why aligning training and inference systems is essential for creating a seamless, fast-moving production pipeline, preventing…
…
continue reading

1
Context Engineering for Productive AI Agents with Filip Kozera - #741
46:01
46:01
Play later
Play later
Lists
Like
Liked
46:01In this episode, Filip Kozera, founder and CEO of Wordware, explains his approach to building agentic workflows where natural language serves as the new programming interface. Filip breaks down the architecture of these "background agents," explaining how they use a reflection loop and tool-calling to execute complex tasks. He discusses the current…
…
continue reading

1
Infrastructure Scaling and Compound AI Systems with Jared Quincy Davis - #740
1:13:02
1:13:02
Play later
Play later
Lists
Like
Liked
1:13:02In this episode, Jared Quincy Davis, founder and CEO at Foundry, introduces the concept of "compound AI systems," which allows users to create powerful, efficient applications by composing multiple, often diverse, AI models and services. We discuss how these "networks of networks" can push the Pareto frontier, delivering results that are simultaneo…
…
continue reading

1
Building Voice AI Agents That Don’t Suck with Kwindla Kramer - #739
1:13:02
1:13:02
Play later
Play later
Lists
Like
Liked
1:13:02In this episode, Kwindla Kramer, co-founder and CEO of Daily and creator of the open source Pipecat framework, joins us to discuss the architecture and challenges of building real-time, production-ready conversational voice AI. Kwin breaks down the full stack for voice agents—from the models and APIs to the critical orchestration layer that manages…
…
continue reading

1
Distilling Transformers and Diffusion Models for Robust Edge Use Cases with Fatih Porikli - #738
1:00:29
1:00:29
Play later
Play later
Lists
Like
Liked
1:00:29Today, we're joined by Fatih Porikli, senior director of technology at Qualcomm AI Research for an in-depth look at several of Qualcomm's accepted papers and demos featured at this year’s CVPR conference. We start with “DiMA: Distilling Multi-modal Large Language Models for Autonomous Driving,” an end-to-end autonomous driving system that incorpora…
…
continue reading

1
Building the Internet of Agents with Vijoy Pandey - #737
56:13
56:13
Play later
Play later
Lists
Like
Liked
56:13Today, we're joined by Vijoy Pandey, SVP and general manager at Outshift by Cisco to discuss a foundational challenge for the enterprise: how do we make specialized agents from different vendors collaborate effectively? As companies like Salesforce, Workday, and Microsoft all develop their own agentic systems, integrating them creates a complex, pr…
…
continue reading

1
LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736
59:31
59:31
Play later
Play later
Lists
Like
Liked
59:31Today, we're joined by Ben Wellington, deputy head of feature forecasting at Two Sigma. We dig into the team’s end-to-end approach to leveraging AI in equities feature forecasting, covering how they identify and create features, collect and quantify historical data, and build predictive models to forecast market behavior and asset prices for tradin…
…
continue reading

1
Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735
56:45
56:45
Play later
Play later
Lists
Like
Liked
56:45Today, we're joined by Jason Corso, co-founder of Voxel51 and professor at the University of Michigan, to explore automated labeling in computer vision. Jason introduces FiftyOne, an open-source platform for visualizing datasets, analyzing models, and improving data quality. We focus on Voxel51’s recent research report, “Zero-shot auto-labeling riv…
…
continue reading

1
Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734
1:25:21
1:25:21
Play later
Play later
Lists
Like
Liked
1:25:21Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matri…
…
continue reading
Today, I’m excited to share a special crossover edition of the podcast recorded live from Google I/O 2025! In this episode, I join Shawn Wang aka Swyx from the Latent Space Podcast, to interview Logan Kilpatrick and Shrestha Basu Mallick, PMs at Google DeepMind working on AI Studio and the Gemini API, along with Kwindla Kramer, CEO of Daily and cre…
…
continue reading

1
RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732
57:09
57:09
Play later
Play later
Lists
Like
Liked
57:09Today, we're joined by Sebastian Gehrmann, head of responsible AI in the Office of the CTO at Bloomberg, to discuss AI safety in retrieval-augmented generation (RAG) systems and generative AI in high-stakes domains like financial services. We explore how RAG, contrary to some expectations, can inadvertently degrade model safety. We cover examples o…
…
continue reading

1
From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731
1:01:25
1:01:25
Play later
Play later
Lists
Like
Liked
1:01:25Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust altern…
…
continue reading

1
How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730
1:07:27
1:07:27
Play later
Play later
Lists
Like
Liked
1:07:27Today, we're joined by Josh Tobin, member of technical staff at OpenAI, to discuss the company’s approach to building AI agents. We cover OpenAI's three agentic offerings—Deep Research for comprehensive web research, Operator for website navigation, and Codex CLI for local code execution. We explore OpenAI’s shift from simple LLM workflows to reaso…
…
continue reading

1
CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729
56:18
56:18
Play later
Play later
Lists
Like
Liked
56:18Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by p…
…
continue reading

1
Generative Benchmarking with Kelly Hong - #728
54:17
54:17
Play later
Play later
Lists
Like
Liked
54:17In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchm…
…
continue reading

1
Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727
1:34:06
1:34:06
Play later
Play later
Lists
Like
Liked
1:34:06In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Large Language Model." Emmanuel explains how his team developed mechanistic interpretability methods to understand the internal workings of Claude by rep…
…
continue reading

1
Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726
51:45
51:45
Play later
Play later
Lists
Like
Liked
51:45Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of a…
…
continue reading

1
Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725
1:09:07
1:09:07
Play later
Play later
Lists
Like
Liked
1:09:07Today, we're joined by Drago Anguelov, head of AI foundations at Waymo, for a deep dive into the role of foundation models in autonomous driving. Drago shares how Waymo is leveraging large-scale machine learning, including vision-language models and generative AI techniques to improve perception, planning, and simulation for its self-driving vehicl…
…
continue reading

1
Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724
50:32
50:32
Play later
Play later
Lists
Like
Liked
50:32Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression…
…
continue reading

1
Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723
58:38
58:38
Play later
Play later
Lists
Like
Liked
58:38Today, we're joined by Jonas Geiping, research group leader at Ellis Institute and the Max Planck Institute for Intelligent Systems to discuss his recent paper, “Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach.” This paper proposes a novel language model architecture which uses recurrent depth to enable “thinking in l…
…
continue reading

1
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722
42:11
42:11
Play later
Play later
Lists
Like
Liked
42:11Today, we're joined by Chengzu Li, PhD student at the University of Cambridge to discuss his recent paper, “Imagine while Reasoning in Space: Multimodal Visualization-of-Thought.” We explore the motivations behind MVoT, its connection to prior work like TopViewRS, and its relation to cognitive science principles such as dual coding theory. We dig i…
…
continue reading

1
Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721
49:29
49:29
Play later
Play later
Lists
Like
Liked
49:29Today, we're joined by Niklas Muennighoff, a PhD student at Stanford University, to discuss his paper, “S1: Simple Test-Time Scaling.” We explore the motivations behind S1, as well as how it compares to OpenAI's O1 and DeepSeek's R1 models. We dig into the different approaches to test-time scaling, including parallel and sequential scaling, as well…
…
continue reading

1
Accelerating AI Training and Inference with AWS Trainium2 with Ron Diamant - #720
1:07:05
1:07:05
Play later
Play later
Lists
Like
Liked
1:07:05Today, we're joined by Ron Diamant, chief architect for Trainium at Amazon Web Services, to discuss hardware acceleration for generative AI and the design and role of the recently released Trainium2 chip. We explore the architectural differences between Trainium and GPUs, highlighting its systolic array-based compute design, and how it balances per…
…
continue reading

1
π0: A Foundation Model for Robotics with Sergey Levine - #719
52:30
52:30
Play later
Play later
Lists
Like
Liked
52:30Today, we're joined by Sergey Levine, associate professor at UC Berkeley and co-founder of Physical Intelligence, to discuss π0 (pi-zero), a general-purpose robotic foundation model. We dig into the model architecture, which pairs a vision language model (VLM) with a diffusion-based action expert, and the model training "recipe," emphasizing the ro…
…
continue reading

1
AI Trends 2025: AI Agents and Multi-Agent Systems with Victor Dibia - #718
1:44:59
1:44:59
Play later
Play later
Lists
Like
Liked
1:44:59Today we’re joined by Victor Dibia, principal research software engineer at Microsoft Research, to explore the key trends and advancements in AI agents and multi-agent systems shaping 2025 and beyond. In this episode, we discuss the unique abilities that set AI agents apart from traditional software systems–reasoning, acting, communicating, and ada…
…
continue reading

1
Speculative Decoding and Efficient LLM Inference with Chris Lott - #717
1:16:30
1:16:30
Play later
Play later
Lists
Like
Liked
1:16:30Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language model inference. We explore the challenges presented by the LLM encoding and decoding (aka generation) and how these interact with various hardware constraints such as FLOPS, memory footprint and memory bandwidth to limit…
…
continue reading

1
Ensuring Privacy for Any LLM with Patricia Thaine - #716
51:33
51:33
Play later
Play later
Lists
Like
Liked
51:33Today, we're joined by Patricia Thaine, co-founder and CEO of Private AI to discuss techniques for ensuring privacy, data minimization, and compliance when using 3rd-party large language models (LLMs) and other AI services. We explore the risks of data leakage from LLMs and embeddings, the complexities of identifying and redacting personal informat…
…
continue reading

1
AI Engineering Pitfalls with Chip Huyen - #715
57:37
57:37
Play later
Play later
Lists
Like
Liked
57:37Today, we're joined by Chip Huyen, independent researcher and writer to discuss her new book, “AI Engineering.” We dig into the definition of AI engineering, its key differences from traditional machine learning engineering, the common pitfalls encountered in engineering AI systems, and strategies to overcome them. We also explore how Chip defines …
…
continue reading

1
Evolving MLOps Platforms for Generative AI and Agents with Abhijit Bose - #714
58:08
58:08
Play later
Play later
Lists
Like
Liked
58:08Today, we're joined by Abhijit Bose, head of enterprise AI and ML platforms at Capital One to discuss the evolution of the company’s approach and insights on Generative AI and platform best practices. In this episode, we dig into the company’s platform-centric approach to AI, and how they’ve been evolving their existing MLOps and data platforms to …
…
continue reading

1
Why Agents Are Stupid & What We Can Do About It with Dan Jeffries - #713
1:08:49
1:08:49
Play later
Play later
Lists
Like
Liked
1:08:49Today, we're joined by Dan Jeffries, founder and CEO of Kentauros AI to discuss the challenges currently faced by those developing advanced AI agents. We dig into how Dan defines agents and distinguishes them from other similar uses of LLM, explore various use cases for them, and dig into ways to create smarter agentic systems. Dan shared his “big …
…
continue reading

1
Automated Reasoning to Prevent LLM Hallucination with Byron Cook - #712
56:48
56:48
Play later
Play later
Lists
Like
Liked
56:48Today, we're joined by Byron Cook, VP and distinguished scientist in the Automated Reasoning Group at AWS to dig into the underlying technology behind the newly announced Automated Reasoning Checks feature of Amazon Bedrock Guardrails. Automated Reasoning Checks uses mathematical proofs to help LLM users safeguard against hallucinations. We explore…
…
continue reading

1
AI at the Edge: Qualcomm AI Research at NeurIPS 2024 with Arash Behboodi - #711
54:47
54:47
Play later
Play later
Lists
Like
Liked
54:47Today, we're joined by Arash Behboodi, director of engineering at Qualcomm AI Research to discuss the papers and workshops Qualcomm will be presenting at this year’s NeurIPS conference. We dig into the challenges and opportunities presented by differentiable simulation in wireless systems, the sciences, and beyond. We also explore recent work that …
…
continue reading

1
AI for Network Management with Shirley Wu - #710
53:44
53:44
Play later
Play later
Lists
Like
Liked
53:44Today, we're joined by Shirley Wu, senior director of software engineering at Juniper Networks to discuss how machine learning and artificial intelligence are transforming network management. We explore various use cases where AI and ML are applied to enhance the quality, performance, and efficiency of networks across Juniper’s customers, including…
…
continue reading

1
Why Your RAG System Is Broken, and How to Fix It with Jason Liu - #709
58:03
58:03
Play later
Play later
Lists
Like
Liked
58:03Today, we're joined by Jason Liu, freelance AI consultant, advisor, and creator of the Instructor library to discuss all things retrieval-augmented generation (RAG). We dig into the tactical and strategic challenges companies face with their RAG system, the different signs Jason looks for to identify looming problems, the issues he most commonly en…
…
continue reading

1
An Agentic Mixture of Experts for DevOps with Sunil Mallya - #708
1:15:09
1:15:09
Play later
Play later
Lists
Like
Liked
1:15:09Today we're joined by Sunil Mallya, CTO and co-founder of Flip AI. We discuss Flip’s incident debugging system for DevOps, which was built using a custom mixture of experts (MoE) large language model (LLM) trained on a novel "CoMELT" observability dataset which combines traditional MELT data—metrics, events, logs, and traces—with code to efficientl…
…
continue reading

1
Building AI Voice Agents with Scott Stephenson - #707
1:01:44
1:01:44
Play later
Play later
Lists
Like
Liked
1:01:44Today, we're joined by Scott Stephenson, co-founder and CEO of Deepgram to discuss voice AI agents. We explore the importance of perception, understanding, and interaction and how these key components work together in building intelligent AI voice agents. We discuss the role of multimodal LLMs as well as speech-to-text and text-to-speech models in …
…
continue reading

1
Is Artificial Superintelligence Imminent? with Tim Rocktäschel - #706
55:52
55:52
Play later
Play later
Lists
Like
Liked
55:52Today, we're joined by Tim Rocktäschel, senior staff research scientist at Google DeepMind, professor of Artificial Intelligence at University College London, and author of the recently published popular science book, “Artificial Intelligence: 10 Things You Should Know.” We dig into the attainability of artificial superintelligence and the path to …
…
continue reading

1
ML Models for Safety-Critical Systems with Lucas García - #705
1:16:06
1:16:06
Play later
Play later
Lists
Like
Liked
1:16:06Today, we're joined by Lucas García, principal product manager for deep learning at MathWorks to discuss incorporating ML models into safety-critical systems. We begin by exploring the critical role of verification and validation (V&V) in these applications. We review the popular V-model for engineering critical systems and then dig into the “W” ad…
…
continue reading

1
AI Agents: Substance or Snake Oil with Arvind Narayanan - #704
54:22
54:22
Play later
Play later
Lists
Like
Liked
54:22Today, we're joined by Arvind Narayanan, professor of Computer Science at Princeton University to discuss his recent works, AI Agents That Matter and AI Snake Oil. In “AI Agents That Matter”, we explore the range of agentic behaviors, the challenges in benchmarking agents, and the ‘capability and reliability gap’, which creates risks when deploying…
…
continue reading

1
AI Agents for Data Analysis with Shreya Shankar - #703
48:24
48:24
Play later
Play later
Lists
Like
Liked
48:24Today, we're joined by Shreya Shankar, a PhD student at UC Berkeley to discuss DocETL, a declarative system for building and optimizing LLM-powered data processing pipelines for large-scale and complex document analysis tasks. We explore how DocETL's optimizer architecture works, the intricacies of building agentic systems for data processing, the …
…
continue reading

1
Stealing Part of a Production Language Model with Nicholas Carlini - #702
1:03:30
1:03:30
Play later
Play later
Lists
Like
Liked
1:03:30Today, we're joined by Nicholas Carlini, research scientist at Google DeepMind to discuss adversarial machine learning and model security, focusing on his 2024 ICML best paper winner, “Stealing part of a production language model.” We dig into this work, which demonstrated the ability to successfully steal the last layer of production language mode…
…
continue reading

1
Supercharging Developer Productivity with ChatGPT and Claude with Simon Willison - #701
1:14:15
1:14:15
Play later
Play later
Lists
Like
Liked
1:14:15Today, we're joined by Simon Willison, independent researcher and creator of Datasette to discuss the many ways software developers and engineers can take advantage of large language models (LLMs) to boost their productivity. We dig into Simon’s own workflows and how he uses popular models like ChatGPT and Anthropic’s Claude to write and test hundr…
…
continue reading

1
Automated Design of Agentic Systems with Shengran Hu - #700
59:30
59:30
Play later
Play later
Lists
Like
Liked
59:30Today, we're joined by Shengran Hu, a PhD student at the University of British Columbia, to discuss Automated Design of Agentic Systems (ADAS), an approach focused on automatically creating agentic system designs. We explore the spectrum of agentic behaviors, the motivation for learning all aspects of agentic system design, the key components of th…
…
continue reading

1
The EU AI Act and Mitigating Bias in Automated Decisioning with Peter van der Putten - #699
45:34
45:34
Play later
Play later
Lists
Like
Liked
45:34Today, we're joined by Peter van der Putten, director of the AI Lab at Pega and assistant professor of AI at Leiden University. We discuss the newly adopted European AI Act and the challenges of applying academic fairness metrics in real-world AI applications. We dig into the key ethical principles behind the Act, its broad definition of AI, and ho…
…
continue reading

1
The Building Blocks of Agentic Systems with Harrison Chase - #698
59:17
59:17
Play later
Play later
Lists
Like
Liked
59:17Today, we're joined by Harrison Chase, co-founder and CEO of LangChain to discuss LLM frameworks, agentic systems, RAG, evaluation, and more. We dig into the elements of a modern LLM framework, including the most productive developer experiences and appropriate levels of abstraction. We dive into agents and agentic systems as well, covering the “sp…
…
continue reading

1
Simplifying On-Device AI for Developers with Siddhika Nevrekar - #697
46:37
46:37
Play later
Play later
Lists
Like
Liked
46:37Today, we're joined by Siddhika Nevrekar, AI Hub head at Qualcomm Technologies, to discuss on-device AI and how to make it easier for developers to take advantage of device capabilities. We unpack the motivations for AI engineers to move model inference from the cloud to local devices, and explore the challenges associated with on-device AI. We dig…
…
continue reading

1
Genie: Generative Interactive Environments with Ashley Edwards - #696
46:51
46:51
Play later
Play later
Lists
Like
Liked
46:51Today, we're joined by Ashley Edwards, a member of technical staff at Runway, to discuss Genie: Generative Interactive Environments, a system for creating ‘playable’ video environments for training deep reinforcement learning (RL) agents at scale in a completely unsupervised manner. We explore the motivations behind Genie, the challenges of data ac…
…
continue reading