Best Nicolay Gerold Podcasts (2025)

1
AI in the Terminal: Enhancing Coding with Warp 1:04:30

Play Pause

30m ago1:04:30

1:04:30

Nicolay here, Most AI coding tools obsess over automating everything. This conversation focuses on the right balance between human skill and AI assistance - where manual context beats web search every time. Today I have the chance to talk to Ben Holmes, a software engineer at Warp, where they're building the AI-first terminal. Manual context engine…

1
#052 Don't Build Models, Build Systems That Build Models 59:22

23d ago59:22

59:22

Nicolay here, Today I have the chance to talk to Charles from Modal, who went from doing a PhD on neural network optimization in the 2010s - when ML engineers could build models with a soldering iron and some sticks - to architecting serverless infrastructure for AI models. Modal is about removing barriers so anyone can spin up a hundred GPUs in se…

1
#051 Build systems that can be debugged at 4am by tired humans with no context 1:05:51

1M ago1:05:51

1:05:51

Nicolay here, Today I have the chance to talk to Charity Majors, CEO and co-founder of Honeycomb, who recently has been writing about the cost crisis in observability. "Your source of truth is production, not your IDE - and if you can't understand your code there, you're flying blind." The key insight is architecturally simple but operationally tra…

1
#050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 1:06:57

2M ago1:06:57

1:06:57

Nicolay here, Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production. Today I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications. He currently writes about produc…

1
#050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 11:00

2M ago11:00

11:00

Nicolay here, Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production. Today I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications. He currently writes about produc…

1
#049 BAML: The Programming Language That Turns LLMs into Predictable Functions 1:02:38

2M ago1:02:38

1:02:38

Nicolay here, I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous by a few basis points. If you don’t know how the LLM performs on your task, you are just duct taping LLMs into your systems. If your LLM-powered app can’t survive a m…

1
#049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions 1:12:34

2M ago1:12:34

1:12:34

Nicolay here, I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous by a few basis points. If you don’t know how the LLM performs on your task, you are just duct taping LLMs into your systems. If your LLM-powered app can’t survive a m…

1
#048 TAKEAWAYS Why Your AI Agents Need Permission to Act, Not Just Read 7:06

2M ago7:06

7:06

Nicolay here, most AI conversations obsess over capabilities. This one focuses on constraints - the right ones that make AI actually useful rather than just impressive demos. Today I have the chance to talk to Dexter Horthy, who recently put out a long piece called the “12-factor agents”. It’s like the 10 commandments, but for building agents. One …

1
#048 Why Your AI Agents Need Permission to Act, Not Just Read 57:02

2M ago57:02

57:02

Nicolay here, most AI conversations obsess over capabilities. This one focuses on constraints - the right ones that make AI actually useful rather than just impressive demos. Today I have the chance to talk to Dexter Horthy, who recently put out a long piece called the “12-factor agents”. It’s like the 10 commandments, but for building agents. One …

1
#047 Architecting Information for Search, Humans, and Artificial Intelligence 57:21

4M ago57:21

57:21

Today on How AI Is Built, Nicolay Gerold sits down with Jorge Arango, an expert in information architecture. Jorge emphasizes that aligning systems with users' mental models is more important than optimizing backend logic alone. He shares a clear framework with four practical steps: Key Points: Information architecture should bridge user mental mod…

1
#046 Building a Search Database From First Principles 53:28

4M ago53:28

53:28

Modern search is broken. There are too many pieces that are glued together. Vector databases for semantic search Text engines for keywords Rerankers to fix the results LLMs to understand queries Metadata filters for precision Each piece works well alone. Together, they often become a mess. When you glue these systems together, you create: Data Cons…

1
#045 RAG As Two Things - Prompt Engineering and Search 1:02:43

5M ago1:02:43

1:02:43

John Berryman moved from aerospace engineering to search, then to ML and LLMs. His path: Eventbrite search → GitHub code search → data science → GitHub Copilot. He was drawn to more math and ML throughout his career. RAG Explained "RAG is not a thing. RAG is two things." It breaks into: Search - finding relevant information Prompt engineering - pre…

1
#044 Graphs Aren't Just For Specialists Anymore 1:03:34

5M ago1:03:34

1:03:34

Kuzu is an embedded graph database that implements Cypher as a library. It can be easily integrated into various environments—from scripts and Android apps to serverless platforms. Its design supports both ephemeral, in-memory graphs (ideal for temporary computations) and large-scale persistent graphs where traditional systems struggle with perform…

1
#043 Knowledge Graphs Won't Fix Bad Data 1:10:58

5M ago1:10:58

1:10:58

Metadata is the foundation of any enterprise knowledge graph. By organizing both technical and business metadata, organizations create a “brain” that supports advanced applications like AI-driven data assistants. The goal is to achieve economies of scale—making data reusable, traceable, and ultimately more valuable. Juan Sequeda is a leading expert…

1
#042 Temporal RAG, Embracing Time for Smarter, Reliable Knowledge Graphs 1:33:43

5M ago1:33:43

1:33:43

Daniel Davis is an expert on knowledge graphs. He has a background in risk assessment and complex systems—from aerospace to cybersecurity. Now he is working on “Temporal RAG” in TrustGraph. Time is a critical—but often ignored—dimension in data. Whether it’s threat intelligence, legal contracts, or API documentation, every data point has a temporal…

1
#041 Context Engineering, How Knowledge Graphs Help LLMs Reason 1:33:34

6M ago1:33:34

1:33:34

Robert Caulk runs Emergent Methods, a research lab building news knowledge graphs. With a Ph.D. in computational mechanics, he spent 12 years creating open-source tools for machine learning and data analysis. His work on projects like Flowdapt (model serving) and FreqAI (adaptive modeling) has earned over 1,000 academic citations. His team built As…

1
#040 Vector Database Quantization, Product, Binary, and Scalar 52:11

6M ago52:11

52:11

When you store vectors, each number takes up 32 bits. With 1000 numbers per vector and millions of vectors, costs explode. A simple chatbot can cost thousands per month just to store and search through vectors. The Fix: Quantization Think of it like image compression. JPEGs look almost as good as raw photos but take up far less space. Quantization …

1
#039 Local-First Search, How to Push Search To End-Devices 53:08

6M ago53:08

53:08

Alex Garcia is a developer focused on making vector search accessible and practical. As he puts it: "I'm a SQLite guy. I use SQLite for a lot of projects... I want an easier vector search thing that I don't have to install 10,000 dependencies to use.” Core Mantra: "Simple, Local, Scalable" Why SQLite Vec? "I didn't go along thinking, 'Oh, I want to…

1
#038 AI-Powered Search, Context Is King, But Your RAG System Ignores Two-Thirds of It 1:14:23

7M ago1:14:23

1:14:23

Today, I (Nicolay Gerold) sit down with Trey Grainger, author of the book AI-Powered Search. We discuss the different techniques for search and recommendations and how to combine them. While RAG (Retrieval-Augmented Generation) has become a buzzword in AI, Trey argues that the current understanding of "RAG" is overly simplified – it's actually a bi…

1
#037 Chunking for RAG: Stop Breaking Your Documents Into Meaningless Pieces 49:12

7M ago49:12

49:12

Today we are back continuing our series on search. We are talking to Brandon Smith, about his work for Chroma. He led one of the largest studies in the field on different chunking techniques. So today we will look at how we can unfuck our RAG systems from badly chosen chunking hyperparameters. The biggest lie in RAG is that semantic search is simpl…

1
#036 How AI Can Start Teaching Itself - Synthetic Data Deep Dive 48:10

7M ago48:10

48:10

Most LLMs you use today already use synthetic data. It’s not a thing of the future. The large labs use a large model (e.g. gpt-4o) to generate training data for a smaller one (gpt-4o-mini). This lets you build fast, cheap models that do one thing well. This is “distillation”. But the vision for synthetic data is much bigger. Enable people to train …

1
#035 A Search System That Learns As You Use It (Agentic RAG) 45:29

7M ago45:29

45:29

Modern RAG systems build on flexibility. At their core, they match each query with the best tool for the job. They know which tool fits each task. When you ask about sales numbers, they reach for SQL. When you need to company policies, they use vector search or BM25. The key is switching tools smoothly. A question about sales figures might need SQL…

1
#034 Rethinking Search Inside Postgres, From Lexemes to BM25 47:15

8M ago47:15

47:15

Many companies use Elastic or OpenSearch and use 10% of the capacity. They have to build ETL pipelines. Get data Normalized. Worry about race conditions. All in all. At the moment, when you want to do search on top of your transactional data, you are forced to build a distributed systems. Not anymore. ParadeDB is building an open-source PostgreSQL …

1
#033 RAG's Biggest Problems & How to Fix It (ft. Synthetic Data) 51:25

8M ago51:25

51:25

RAG isn't a magic fix for search problems. While it works well at first, most teams find it's not good enough for production out of the box. The key is to make it better step by step, using good testing and smart data creation. Today, we are talking to Saahil Ognawala from Jina AI to start to understand RAG. To build a good RAG system, you need thr…

1
#032 Improving Documentation Quality for RAG Systems 46:36

8M ago46:36

46:36

Documentation quality is the silent killer of RAG systems. A single ambiguous sentence might corrupt an entire set of responses. But the hardest part isn't fixing errors - it's finding them. Today we are talking to Max Buckley on how to find and fix these errors. Max works at Google and has built a lot of interesting experiments with LLMs on using …

1
#031 BM25 As The Workhorse Of Search; Vectors Are Its Visionary Cousin 54:04

8M ago54:04

54:04

Ever wondered why vector search isn't always the best path for information retrieval? Join us as we dive deep into BM25 and its unmatched efficiency in our latest podcast episode with David Tippett from GitHub. Discover how BM25 transforms search efficiency, even at GitHub's immense scale. BM25, short for Best Match 25, use term frequency (TF) and …

1
#030 Vector Search at Scale, Why One Size Doesn't Fit All 36:25

9M ago36:25

36:25

Ever wondered why your vector search becomes painfully slow after scaling past a million vectors? You're not alone - even tech giants struggle with this. Charles Xie, founder of Zilliz (company behind Milvus), shares how they solved vector database scaling challenges at 100B+ vector scale: Key Insights: Multi-tier storage strategy: GPU memory (1% o…

1
#029 Search Systems at Scale, Avoiding Local Maxima and Other Engineering Lessons 54:46

9M ago54:46

54:46

Modern search systems face a complex balancing act between performance, relevancy, and cost, requiring careful architectural decisions at each layer. While vector search generates buzz, hybrid approaches combining traditional text search with vector capabilities yield better results. The architecture typically splits into three core components: ing…

1
#028 Training Multi-Modal AI, Inside the Jina CLIP Embedding Model 49:21

9M ago49:21

49:21

Today we are talking to Michael Günther, a senior machine learning scientist at Jina about his work on JINA Clip. Some key points: Uni-modal embeddings convert a single type of input (text, images, audio) into vectors Multimodal embeddings learn a joint embedding space that can handle multiple types of input, enabling cross-modal search (e.g., sear…

1
#027 Building the database for AI, Multi-modal AI, Multi-modal Storage 44:53

9M ago44:53

44:53

Imagine a world where data bottlenecks, slow data loaders, or memory issues on the VM don't hold back machine learning. Machine learning and AI success depends on the speed you can iterate. LanceDB is here to to enable fast experiments on top of terabytes of unstructured data. It is the database for AI. Dive with us into how LanceDB was built, what…

1
#026 Embedding Numbers, Categories, Locations, Images, Text, and The World 46:43

10M ago46:43

46:43

Today’s guest is Mór Kapronczay. Mór is the Head of ML at superlinked. Superlinked is a compute framework for your information retrieval and feature engineering systems, where they turn anything into embeddings. When most people think about embeddings, they think about ada, openai. You just take your text and throw it in there. But that’s too crude…

1
#025 Data Models to Remove Ambiguity from AI and Search 58:39

10M ago58:39

58:39

Today we have Jessica Talisman with us, who is working as an Information Architect at Adobe. She is (in my opinion) the expert on taxonomies and ontologies. That’s what you will learn today in this episode of How AI Is Built. Taxonomies, ontologies, knowledge graphs. Everyone is talking about them no-one knows how to build them. But before we look …

1
#024 How ColPali is Changing Information Retrieval 54:56

10M ago54:56

54:56

ColPali makes us rethink how we approach document processing. ColPali revolutionizes visual document search by combining late interaction scoring with visual language models. This approach eliminates the need for extensive text extraction and preprocessing, handling messy real-world data more effectively than traditional methods. In this episode, J…

1
#023 The Power of Rerankers in Modern Search 42:28

10M ago42:28

42:28

Today, we're talking to Aamir Shakir, the founder and baker at mixedbread.ai, where he's building some of the best embedding and re-ranking models out there. We go into the world of rerankers, looking at how they can classify, deduplicate documents, prioritize LLM outputs, and delve into models like ColBERT. We discuss: The role of rerankers in ret…

1
#022 The Limits of Embeddings, Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It) 46:05

10M ago46:05

46:05

Text embeddings have limitations when it comes to handling long documents and out-of-domain data. Today, we are talking to Nils Reimers. He is one of the researchers who kickstarted the field of dense embeddings, developed sentence transformers, started HuggingFace’s Neural Search team and now leads the development of search foundational models at …

1
#021 The Problems You Will Encounter With RAG At Scale And How To Prevent (or fix) Them 50:08

10M ago50:08

50:08

Hey! Welcome back. Today we look at how we can get our RAG system ready for scale. We discuss common problems and their solutions, when you introduce more users and more requests to your system. For this we are joined by Nirant Kasliwal, the author of fastembed. Nirant shares practical insights on metadata extraction, evaluation strategies, and eme…

1
#020 The Evolution of Search, Finding Search Signals, GenAI Augmented Retrieval 52:15

11M ago52:15

52:15

In this episode of How AI is Built, Nicolay Gerold interviews Doug Turnbull, a search engineer at Reddit and author on “Relevant Search”. They discuss how methods and technologies, including large language models (LLMs) and semantic search, contribute to relevant search results. Key Highlights: Defining relevance is challenging and depends heavily …

1
#019 Data-driven Search Optimization, Analysing Relevance 51:13

11M ago51:13

51:13

In this episode, we talk data-driven search optimizations with Charlie Hull. Charlie is a search expert from Open Source Connections. He has built Flax, one of the leading open source search companies in the UK, has written “Searching the Enterprise”, and is one of the main voices on data-driven search. We discuss strategies to improve search syste…

1
#018 Query Understanding: Doing The Work Before The Query Hits The Database 53:01

11M ago53:01

53:01

Welcome back to How AI Is Built. We have got a very special episode to kick off season two. Daniel Tunkelang is a search consultant currently working with Algolia. He is a leader in the field of information retrieval, recommender systems, and AI-powered search. He worked for Canva, Algolia, Cisco, Gartner, Handshake, to pick a few. His core focus i…

1
Season 2 Trailer: Mastering Search 4:15

12M ago4:15

4:15

Today we are launching the season 2 of How AI Is Built. The last few weeks, we spoke to a lot of regular listeners and past guests and collected feedback. Analyzed our episode data. And we will be applying the learnings to season 2. This season will be all about search. We are trying to make it better, more actionable, and more in-depth. The goal i…

1
#017 Unlocking Value from Unstructured Data, Real-World Applications of Generative AI 36:27

1y ago36:27

36:27

In this episode of "How AI is Built," host Nicolay Gerold interviews Jonathan Yarkoni, founder of Reach Latent. Jonathan shares his expertise in extracting value from unstructured data using AI, discussing challenging projects, the impact of ChatGPT, and the future of generative AI. From weather prediction to legal tech, Jonathan provides valuable …

1
#016 Data Processing for AI, Integrating AI into Data Pipelines, Spark 46:25

1y ago46:25

46:25

This episode of "How AI Is Built" is all about data processing for AI. Abhishek Choudhary and Nicolay discuss Spark and alternatives to process data so it is AI-ready. Spark is a distributed system that allows for fast data processing by utilizing memory. It uses a dataframe representation "RDD" to simplify data processing. When should you use Spar…

1
#015 Building AI Agents for the Enterprise, Agent Cost Controls, Seamless UX 35:11

1y ago35:11

35:11

In this episode, Nicolay talks with Rahul Parundekar, founder of AI Hero, about the current state and future of AI agents. Drawing from over a decade of experience working on agent technology at companies like Toyota, Rahul emphasizes the importance of focusing on realistic, bounded use cases rather than chasing full autonomy. They dive into the ke…

1
#014 Building Predictable Agents through Prompting, Compression, and Memory Strategies 32:13

1y ago32:13

32:13

In this conversation, Nicolay and Richmond Alake discuss various topics related to building AI agents and using MongoDB in the AI space. They cover the use of agents and multi-agents, the challenges of controlling agent behavior, and the importance of prompt compression. When you are building agents. Build them iteratively. Start with simple LLM ca…

1
Data Integration and Ingestion for AI & LLMs, Architecting Data Flows | changelog 3 14:52

1y ago14:52

14:52

In this episode, Kirk Marple, CEO and founder of Graphlit, shares his expertise on building efficient data integrations. Kirk breaks down his approach using relatable concepts: The "Two-Sided Funnel": This model streamlines data flow by converting various data sources into a standard format before distributing it. Universal Data Streams: Kirk expla…

1
#013 ETL for LLMs, Integrating and Normalizing Unstructured Data 36:47

1y ago36:47

36:47

In our latest episode, we sit down with Derek Tu, Founder and CEO of Carbon, a cutting-edge ETL tool designed specifically for large language models (LLMs). Carbon is streamlining AI development by providing a platform for integrating unstructured data from various sources, enabling businesses to build innovative AI applications more efficiently wh…

1
#012 Serverless Data Orchestration, AI in the Data Stack, AI Pipelines 28:05

1y ago28:05

28:05

In this episode, Nicolay sits down with Hugo Lu, founder and CEO of Orchestra, a modern data orchestration platform. As data pipelines and analytics workflows become increasingly complex, spanning multiple teams, tools and cloud services, the need for unified orchestration and visibility has never been greater. Orchestra is a serverless data orches…

1
#011 Mastering Vector Databases, Product & Binary Quantization, Multi-Vector Search 40:05

1y ago40:05

40:05

Ever wondered how AI systems handle images and videos, or how they make lightning-fast recommendations? Tune in as Nicolay chats with Zain Hassan, an expert in vector databases from Weaviate. They break down complex topics like quantization, multi-vector search, and the potential of multimodal search, making them accessible for all listeners. Zain …

1
#010 Building Robust AI and Data Systems, Data Architecture, Data Quality, Data Storage 45:32

1y ago45:32

45:32

In this episode of "How AI is Built", data architect Anjan Banerjee provides an in-depth look at the world of data architecture and building complex AI and data systems. Anjan breaks down the basics using simple analogies, explaining how data architecture involves sorting, cleaning, and painting a picture with data, much like organizing Lego bricks…

1
#009 Modern Data Infrastructure for Analytics and AI, Lakehouses, Open Source Data Stack 27:52

1y ago27:52

27:52

Jorrit Sandbrink, a data engineer specializing on open table formats, discusses the advantages of decoupling storage and compute, the importance of choosing the right table format, and strategies for optimizing your data pipelines. This episode is full of practical advice for anyone looking to build a high-performance data analytics platform. Lake …