Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.
…
continue reading

1
#047 Architecting Information for Search, Humans, and Artificial Intelligence
57:22
57:22
Play later
Play later
Lists
Like
Liked
57:22Today on How AI Is Built, Nicolay Gerold sits down with Jorge Arango, an expert in information architecture. Jorge emphasizes that aligning systems with users' mental models is more important than optimizing backend logic alone. He shares a clear framework with four practical steps: Key Points: Information architecture should bridge user mental mod…
…
continue reading

1
#046 Building a Search Database From First Principles
53:29
53:29
Play later
Play later
Lists
Like
Liked
53:29Modern search is broken. There are too many pieces that are glued together. Vector databases for semantic search Text engines for keywords Rerankers to fix the results LLMs to understand queries Metadata filters for precision Each piece works well alone. Together, they often become a mess. When you glue these systems together, you create: Data Cons…
…
continue reading

1
#045 RAG As Two Things - Prompt Engineering and Search
1:02:44
1:02:44
Play later
Play later
Lists
Like
Liked
1:02:44John Berryman moved from aerospace engineering to search, then to ML and LLMs. His path: Eventbrite search → GitHub code search → data science → GitHub Copilot. He was drawn to more math and ML throughout his career. RAG Explained "RAG is not a thing. RAG is two things." It breaks into: Search - finding relevant information Prompt engineering - pre…
…
continue reading

1
#044 Graphs Aren't Just For Specialists Anymore
1:03:35
1:03:35
Play later
Play later
Lists
Like
Liked
1:03:35Kuzu is an embedded graph database that implements Cypher as a library. It can be easily integrated into various environments—from scripts and Android apps to serverless platforms. Its design supports both ephemeral, in-memory graphs (ideal for temporary computations) and large-scale persistent graphs where traditional systems struggle with perform…
…
continue reading

1
#043 Knowledge Graphs Won't Fix Bad Data
1:10:59
1:10:59
Play later
Play later
Lists
Like
Liked
1:10:59Metadata is the foundation of any enterprise knowledge graph. By organizing both technical and business metadata, organizations create a “brain” that supports advanced applications like AI-driven data assistants. The goal is to achieve economies of scale—making data reusable, traceable, and ultimately more valuable. Juan Sequeda is a leading expert…
…
continue reading

1
#042 Temporal RAG, Embracing Time for Smarter, Reliable Knowledge Graphs
1:33:44
1:33:44
Play later
Play later
Lists
Like
Liked
1:33:44Daniel Davis is an expert on knowledge graphs. He has a background in risk assessment and complex systems—from aerospace to cybersecurity. Now he is working on “Temporal RAG” in TrustGraph. Time is a critical—but often ignored—dimension in data. Whether it’s threat intelligence, legal contracts, or API documentation, every data point has a temporal…
…
continue reading

1
#041 Context Engineering, How Knowledge Graphs Help LLMs Reason
1:33:35
1:33:35
Play later
Play later
Lists
Like
Liked
1:33:35Robert Caulk runs Emergent Methods, a research lab building news knowledge graphs. With a Ph.D. in computational mechanics, he spent 12 years creating open-source tools for machine learning and data analysis. His work on projects like Flowdapt (model serving) and FreqAI (adaptive modeling) has earned over 1,000 academic citations. His team built As…
…
continue reading

1
#040 Vector Database Quantization, Product, Binary, and Scalar
52:12
52:12
Play later
Play later
Lists
Like
Liked
52:12When you store vectors, each number takes up 32 bits. With 1000 numbers per vector and millions of vectors, costs explode. A simple chatbot can cost thousands per month just to store and search through vectors. The Fix: Quantization Think of it like image compression. JPEGs look almost as good as raw photos but take up far less space. Quantization …
…
continue reading

1
#039 Local-First Search, How to Push Search To End-Devices
53:09
53:09
Play later
Play later
Lists
Like
Liked
53:09Alex Garcia is a developer focused on making vector search accessible and practical. As he puts it: "I'm a SQLite guy. I use SQLite for a lot of projects... I want an easier vector search thing that I don't have to install 10,000 dependencies to use.” Core Mantra: "Simple, Local, Scalable" Why SQLite Vec? "I didn't go along thinking, 'Oh, I want to…
…
continue reading

1
#038 AI-Powered Search, Context Is King, But Your RAG System Ignores Two-Thirds of It
1:14:24
1:14:24
Play later
Play later
Lists
Like
Liked
1:14:24Today, I (Nicolay Gerold) sit down with Trey Grainger, author of the book AI-Powered Search. We discuss the different techniques for search and recommendations and how to combine them. While RAG (Retrieval-Augmented Generation) has become a buzzword in AI, Trey argues that the current understanding of "RAG" is overly simplified – it's actually a bi…
…
continue reading

1
#037 Chunking for RAG: Stop Breaking Your Documents Into Meaningless Pieces
49:13
49:13
Play later
Play later
Lists
Like
Liked
49:13Today we are back continuing our series on search. We are talking to Brandon Smith, about his work for Chroma. He led one of the largest studies in the field on different chunking techniques. So today we will look at how we can unfuck our RAG systems from badly chosen chunking hyperparameters. The biggest lie in RAG is that semantic search is simpl…
…
continue reading

1
#036 How AI Can Start Teaching Itself - Synthetic Data Deep Dive
48:11
48:11
Play later
Play later
Lists
Like
Liked
48:11Most LLMs you use today already use synthetic data. It’s not a thing of the future. The large labs use a large model (e.g. gpt-4o) to generate training data for a smaller one (gpt-4o-mini). This lets you build fast, cheap models that do one thing well. This is “distillation”. But the vision for synthetic data is much bigger. Enable people to train …
…
continue reading

1
#035 A Search System That Learns As You Use It (Agentic RAG)
45:30
45:30
Play later
Play later
Lists
Like
Liked
45:30Modern RAG systems build on flexibility. At their core, they match each query with the best tool for the job. They know which tool fits each task. When you ask about sales numbers, they reach for SQL. When you need to company policies, they use vector search or BM25. The key is switching tools smoothly. A question about sales figures might need SQL…
…
continue reading

1
#034 Rethinking Search Inside Postgres, From Lexemes to BM25
47:16
47:16
Play later
Play later
Lists
Like
Liked
47:16Many companies use Elastic or OpenSearch and use 10% of the capacity. They have to build ETL pipelines. Get data Normalized. Worry about race conditions. All in all. At the moment, when you want to do search on top of your transactional data, you are forced to build a distributed systems. Not anymore. ParadeDB is building an open-source PostgreSQL …
…
continue reading

1
#033 RAG's Biggest Problems & How to Fix It (ft. Synthetic Data)
51:26
51:26
Play later
Play later
Lists
Like
Liked
51:26RAG isn't a magic fix for search problems. While it works well at first, most teams find it's not good enough for production out of the box. The key is to make it better step by step, using good testing and smart data creation. Today, we are talking to Saahil Ognawala from Jina AI to start to understand RAG. To build a good RAG system, you need thr…
…
continue reading

1
#032 Improving Documentation Quality for RAG Systems
46:37
46:37
Play later
Play later
Lists
Like
Liked
46:37Documentation quality is the silent killer of RAG systems. A single ambiguous sentence might corrupt an entire set of responses. But the hardest part isn't fixing errors - it's finding them. Today we are talking to Max Buckley on how to find and fix these errors. Max works at Google and has built a lot of interesting experiments with LLMs on using …
…
continue reading

1
#031 BM25 As The Workhorse Of Search; Vectors Are Its Visionary Cousin
54:05
54:05
Play later
Play later
Lists
Like
Liked
54:05Ever wondered why vector search isn't always the best path for information retrieval? Join us as we dive deep into BM25 and its unmatched efficiency in our latest podcast episode with David Tippett from GitHub. Discover how BM25 transforms search efficiency, even at GitHub's immense scale. BM25, short for Best Match 25, use term frequency (TF) and …
…
continue reading

1
#030 Vector Search at Scale, Why One Size Doesn't Fit All
36:26
36:26
Play later
Play later
Lists
Like
Liked
36:26Ever wondered why your vector search becomes painfully slow after scaling past a million vectors? You're not alone - even tech giants struggle with this. Charles Xie, founder of Zilliz (company behind Milvus), shares how they solved vector database scaling challenges at 100B+ vector scale: Key Insights: Multi-tier storage strategy: GPU memory (1% o…
…
continue reading

1
#029 Search Systems at Scale, Avoiding Local Maxima and Other Engineering Lessons
54:47
54:47
Play later
Play later
Lists
Like
Liked
54:47Modern search systems face a complex balancing act between performance, relevancy, and cost, requiring careful architectural decisions at each layer. While vector search generates buzz, hybrid approaches combining traditional text search with vector capabilities yield better results. The architecture typically splits into three core components: ing…
…
continue reading

1
#028 Training Multi-Modal AI, Inside the Jina CLIP Embedding Model
49:22
49:22
Play later
Play later
Lists
Like
Liked
49:22Today we are talking to Michael Günther, a senior machine learning scientist at Jina about his work on JINA Clip. Some key points: Uni-modal embeddings convert a single type of input (text, images, audio) into vectors Multimodal embeddings learn a joint embedding space that can handle multiple types of input, enabling cross-modal search (e.g., sear…
…
continue reading

1
#027 Building the database for AI, Multi-modal AI, Multi-modal Storage
44:54
44:54
Play later
Play later
Lists
Like
Liked
44:54Imagine a world where data bottlenecks, slow data loaders, or memory issues on the VM don't hold back machine learning. Machine learning and AI success depends on the speed you can iterate. LanceDB is here to to enable fast experiments on top of terabytes of unstructured data. It is the database for AI. Dive with us into how LanceDB was built, what…
…
continue reading

1
#026 Embedding Numbers, Categories, Locations, Images, Text, and The World
46:44
46:44
Play later
Play later
Lists
Like
Liked
46:44Today’s guest is Mór Kapronczay. Mór is the Head of ML at superlinked. Superlinked is a compute framework for your information retrieval and feature engineering systems, where they turn anything into embeddings. When most people think about embeddings, they think about ada, openai. You just take your text and throw it in there. But that’s too crude…
…
continue reading

1
#025 Data Models to Remove Ambiguity from AI and Search
58:40
58:40
Play later
Play later
Lists
Like
Liked
58:40Today we have Jessica Talisman with us, who is working as an Information Architect at Adobe. She is (in my opinion) the expert on taxonomies and ontologies. That’s what you will learn today in this episode of How AI Is Built. Taxonomies, ontologies, knowledge graphs. Everyone is talking about them no-one knows how to build them. But before we look …
…
continue reading

1
#024 How ColPali is Changing Information Retrieval
54:57
54:57
Play later
Play later
Lists
Like
Liked
54:57ColPali makes us rethink how we approach document processing. ColPali revolutionizes visual document search by combining late interaction scoring with visual language models. This approach eliminates the need for extensive text extraction and preprocessing, handling messy real-world data more effectively than traditional methods. In this episode, J…
…
continue reading

1
#023 The Power of Rerankers in Modern Search
42:29
42:29
Play later
Play later
Lists
Like
Liked
42:29Today, we're talking to Aamir Shakir, the founder and baker at mixedbread.ai, where he's building some of the best embedding and re-ranking models out there. We go into the world of rerankers, looking at how they can classify, deduplicate documents, prioritize LLM outputs, and delve into models like ColBERT. We discuss: The role of rerankers in ret…
…
continue reading

1
#022 The Limits of Embeddings, Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It)
46:06
46:06
Play later
Play later
Lists
Like
Liked
46:06Text embeddings have limitations when it comes to handling long documents and out-of-domain data. Today, we are talking to Nils Reimers. He is one of the researchers who kickstarted the field of dense embeddings, developed sentence transformers, started HuggingFace’s Neural Search team and now leads the development of search foundational models at …
…
continue reading

1
#021 The Problems You Will Encounter With RAG At Scale And How To Prevent (or fix) Them
50:09
50:09
Play later
Play later
Lists
Like
Liked
50:09Hey! Welcome back. Today we look at how we can get our RAG system ready for scale. We discuss common problems and their solutions, when you introduce more users and more requests to your system. For this we are joined by Nirant Kasliwal, the author of fastembed. Nirant shares practical insights on metadata extraction, evaluation strategies, and eme…
…
continue reading

1
#020 The Evolution of Search, Finding Search Signals, GenAI Augmented Retrieval
52:16
52:16
Play later
Play later
Lists
Like
Liked
52:16In this episode of How AI is Built, Nicolay Gerold interviews Doug Turnbull, a search engineer at Reddit and author on “Relevant Search”. They discuss how methods and technologies, including large language models (LLMs) and semantic search, contribute to relevant search results. Key Highlights: Defining relevance is challenging and depends heavily …
…
continue reading

1
#019 Data-driven Search Optimization, Analysing Relevance
51:14
51:14
Play later
Play later
Lists
Like
Liked
51:14In this episode, we talk data-driven search optimizations with Charlie Hull. Charlie is a search expert from Open Source Connections. He has built Flax, one of the leading open source search companies in the UK, has written “Searching the Enterprise”, and is one of the main voices on data-driven search. We discuss strategies to improve search syste…
…
continue reading

1
#018 Query Understanding: Doing The Work Before The Query Hits The Database
53:02
53:02
Play later
Play later
Lists
Like
Liked
53:02Welcome back to How AI Is Built. We have got a very special episode to kick off season two. Daniel Tunkelang is a search consultant currently working with Algolia. He is a leader in the field of information retrieval, recommender systems, and AI-powered search. He worked for Canva, Algolia, Cisco, Gartner, Handshake, to pick a few. His core focus i…
…
continue reading
Today we are launching the season 2 of How AI Is Built. The last few weeks, we spoke to a lot of regular listeners and past guests and collected feedback. Analyzed our episode data. And we will be applying the learnings to season 2. This season will be all about search. We are trying to make it better, more actionable, and more in-depth. The goal i…
…
continue reading

1
#017 Unlocking Value from Unstructured Data, Real-World Applications of Generative AI
36:28
36:28
Play later
Play later
Lists
Like
Liked
36:28In this episode of "How AI is Built," host Nicolay Gerold interviews Jonathan Yarkoni, founder of Reach Latent. Jonathan shares his expertise in extracting value from unstructured data using AI, discussing challenging projects, the impact of ChatGPT, and the future of generative AI. From weather prediction to legal tech, Jonathan provides valuable …
…
continue reading

1
#016 Data Processing for AI, Integrating AI into Data Pipelines, Spark
46:26
46:26
Play later
Play later
Lists
Like
Liked
46:26This episode of "How AI Is Built" is all about data processing for AI. Abhishek Choudhary and Nicolay discuss Spark and alternatives to process data so it is AI-ready. Spark is a distributed system that allows for fast data processing by utilizing memory. It uses a dataframe representation "RDD" to simplify data processing. When should you use Spar…
…
continue reading

1
#015 Building AI Agents for the Enterprise, Agent Cost Controls, Seamless UX
35:12
35:12
Play later
Play later
Lists
Like
Liked
35:12In this episode, Nicolay talks with Rahul Parundekar, founder of AI Hero, about the current state and future of AI agents. Drawing from over a decade of experience working on agent technology at companies like Toyota, Rahul emphasizes the importance of focusing on realistic, bounded use cases rather than chasing full autonomy. They dive into the ke…
…
continue reading

1
#014 Building Predictable Agents through Prompting, Compression, and Memory Strategies
32:14
32:14
Play later
Play later
Lists
Like
Liked
32:14In this conversation, Nicolay and Richmond Alake discuss various topics related to building AI agents and using MongoDB in the AI space. They cover the use of agents and multi-agents, the challenges of controlling agent behavior, and the importance of prompt compression. When you are building agents. Build them iteratively. Start with simple LLM ca…
…
continue reading

1
Data Integration and Ingestion for AI & LLMs, Architecting Data Flows | changelog 3
14:53
14:53
Play later
Play later
Lists
Like
Liked
14:53In this episode, Kirk Marple, CEO and founder of Graphlit, shares his expertise on building efficient data integrations. Kirk breaks down his approach using relatable concepts: The "Two-Sided Funnel": This model streamlines data flow by converting various data sources into a standard format before distributing it. Universal Data Streams: Kirk expla…
…
continue reading

1
#013 ETL for LLMs, Integrating and Normalizing Unstructured Data
36:48
36:48
Play later
Play later
Lists
Like
Liked
36:48In our latest episode, we sit down with Derek Tu, Founder and CEO of Carbon, a cutting-edge ETL tool designed specifically for large language models (LLMs). Carbon is streamlining AI development by providing a platform for integrating unstructured data from various sources, enabling businesses to build innovative AI applications more efficiently wh…
…
continue reading

1
#012 Serverless Data Orchestration, AI in the Data Stack, AI Pipelines
28:06
28:06
Play later
Play later
Lists
Like
Liked
28:06In this episode, Nicolay sits down with Hugo Lu, founder and CEO of Orchestra, a modern data orchestration platform. As data pipelines and analytics workflows become increasingly complex, spanning multiple teams, tools and cloud services, the need for unified orchestration and visibility has never been greater. Orchestra is a serverless data orches…
…
continue reading

1
#011 Mastering Vector Databases, Product & Binary Quantization, Multi-Vector Search
40:06
40:06
Play later
Play later
Lists
Like
Liked
40:06Ever wondered how AI systems handle images and videos, or how they make lightning-fast recommendations? Tune in as Nicolay chats with Zain Hassan, an expert in vector databases from Weaviate. They break down complex topics like quantization, multi-vector search, and the potential of multimodal search, making them accessible for all listeners. Zain …
…
continue reading

1
#010 Building Robust AI and Data Systems, Data Architecture, Data Quality, Data Storage
45:33
45:33
Play later
Play later
Lists
Like
Liked
45:33In this episode of "How AI is Built", data architect Anjan Banerjee provides an in-depth look at the world of data architecture and building complex AI and data systems. Anjan breaks down the basics using simple analogies, explaining how data architecture involves sorting, cleaning, and painting a picture with data, much like organizing Lego bricks…
…
continue reading

1
#009 Modern Data Infrastructure for Analytics and AI, Lakehouses, Open Source Data Stack
27:53
27:53
Play later
Play later
Lists
Like
Liked
27:53Jorrit Sandbrink, a data engineer specializing on open table formats, discusses the advantages of decoupling storage and compute, the importance of choosing the right table format, and strategies for optimizing your data pipelines. This episode is full of practical advice for anyone looking to build a high-performance data analytics platform. Lake …
…
continue reading

1
#008 Knowledge Graphs for Better RAG, Virtual Entities, Hybrid Data Models
36:40
36:40
Play later
Play later
Lists
Like
Liked
36:40Kirk Marple, CEO and founder of Graphlit, discusses the evolution of his company from a data cataloging tool to an platform designed for ETL (Extract, Transform, Load) and knowledge retrieval for Large Language Models (LLMs). Graphlit empowers users to build custom applications on top of its API that go beyond naive RAG. Key Points: Knowledge Graph…
…
continue reading

1
#007 Navigating the Modern Data Stack, Choosing the Right OSS Tools, From Problem to Requirements to Architecture
38:12
38:12
Play later
Play later
Lists
Like
Liked
38:12From Problem to Requirements to Architecture. In this episode, Nicolay Gerold and Jon Erich Kemi Warghed discuss the landscape of data engineering, sharing insights on selecting the right tools, implementing effective data governance, and leveraging powerful concepts like software-defined assets. They discuss the challenges of keeping up with the e…
…
continue reading

1
#006 Data Orchestration Tools, Choosing the right one for your needs
32:37
32:37
Play later
Play later
Lists
Like
Liked
32:37In this episode, Nicolay Gerold interviews John Wessel, the founder of Agreeable Data, about data orchestration. They discuss the evolution of data orchestration tools, the popularity of Apache Airflow, the crowded market of orchestration tools, and the key problem that orchestrators solve. They also explore the components of a data orchestrator, t…
…
continue reading

1
#005 Building Reliable LLM Applications, Production-Ready RAG, Data-Driven Evals
29:40
29:40
Play later
Play later
Lists
Like
Liked
29:40In this episode of "How AI is Built", we learn how to build and evaluate real-world language model applications with Shahul and Jithin, creators of Ragas. Ragas is a powerful open-source library that helps developers test, evaluate, and fine-tune Retrieval Augmented Generation (RAG) applications, streamlining their path to production readiness. Mai…
…
continue reading

1
Lance v2: Rethinking Columnar Storage for Faster Lookups, Nulls, and Flexible Encodings | changelog 2
21:33
21:33
Play later
Play later
Lists
Like
Liked
21:33In this episode of Changelog, Weston Pace dives into the latest updates to LanceDB, an open-source vector database and file format. Lance's new V2 file format redefines the traditional notion of columnar storage, allowing for more efficient handling of large multimodal datasets like images and embeddings. Weston discusses the goals driving LanceDB'…
…
continue reading

1
#004 AI with Supabase, Postgres Configuration, Real-Time Processing, and more
31:57
31:57
Play later
Play later
Lists
Like
Liked
31:57Had a fantastic conversation with Christopher Williams, Solutions Architect at Supabase, about setting up Postgres the right way for AI. We dug deep into Supabase, exploring: Core components and how they power real-time AI solutions Optimizing Postgres for AI workloads The magic of PG Vector and other key extensions Supabase’s future and exciting n…
…
continue reading

1
#003 AI Inside Your Database, Real-Time AI, Declarative ML/AI
36:04
36:04
Play later
Play later
Lists
Like
Liked
36:04If you've ever wanted a simpler way to integrate AI directly into your database, SuperDuperDB might be the answer. SuperDuperDB lets you easily apply AI processes to your data while keeping everything up-to-date with real-time calculations. It works with various databases and aims to make AI development less of a headache. In this podcast, we explo…
…
continue reading

1
Supabase acquires OrioleDB, A New Database Engine for PostgreSQL | changelog 1
13:37
13:37
Play later
Play later
Lists
Like
Liked
13:37Supabase just acquired OrioleDB, a storage engine for PostgreSQL. Oriole gets creative with MVCC! It uses an UNDO log rather than keeping multiple versions of an entire data row (tuple). This means when you update data, Oriole tracks the changes needed to "undo" the update if necessary. Think of this like the "undo" function in a text editor. Inste…
…
continue reading

1
#002 AI Powered Data Transformation, Combining gen & trad AI, Semantic Validation
37:09
37:09
Play later
Play later
Lists
Like
Liked
37:09Today’s guest is Antonio Bustamante, a serial entrepreneur who previously built Kite and Silo and is now working to fix bad data. He is building bem, the data tool to transform any data into the schema your AI and software needs. bem.ai is a data tool that focuses on transforming any data into the schema needed for AI and software. It acts as a sys…
…
continue reading