Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Keith Bourne. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Keith Bourne or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

RAG Deep Dive: Building AI Systems That Actually Know Your Data (Chapter 1-3)

20:45
 
Share
 

Manage episode 523808526 series 3705596
Content provided by Keith Bourne. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Keith Bourne or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode, we take a deep technical dive into Retrieval-Augmented Generation (RAG), drawing heavily from Keith Bourne's book Unlocking Data with Generative AI and RAG. We explore why RAG has become indispensable for enterprise AI systems, break down the core architecture, and share practical implementation guidance for engineers building production-grade pipelines.

What We Cover

The Problem RAG Solves

No matter how advanced LLMs become—GPT, Llama, Gemini, Claude—they fundamentally lack access to your private, proprietary, or real-time data. RAG bridges this gap by combining LLM reasoning with dynamic retrieval of relevant information.

Why RAG Is Exploding Now

  • Context windows have grown dramatically (Llama 4 Scout handles up to 10M tokens)
  • The ecosystem has matured—LangChain alone hit 70M monthly downloads in May 2025
  • Infrastructure for vector storage and retrieval is production-ready

The Three-Stage Architecture

  1. Indexing: Convert documents into vector embeddings and store in a vector database
  2. Retrieval: Embed user queries and perform similarity search to find relevant chunks
  3. Generation: Feed retrieved context into an LLM prompt to generate grounded responses

RAG vs. Fine-Tuning

We compare trade-offs between augmenting at inference time versus modifying model weights, and discuss hybrid approaches that combine both.

Implementation Deep Dive

  • Data ingestion and preprocessing strategies
  • Chunking with RecursiveCharacterTextSplitter (1,000 tokens, 200 overlap)
  • Embedding models and vector databases (Chroma DB, Pinecone, Weaviate)
  • Pipeline orchestration with LangChain Expression Language (LCEL)
  • Source citation patterns for compliance and auditability

Real-World Applications

Customer support chatbots, financial advisory systems, healthcare recommendations, ecommerce personalization, and internal knowledge bases.

Open Challenges

  • "Lost in the middle" effect with long contexts
  • Multiple needles problem
  • Hallucination verification
  • Unstructured data preprocessing complexity

Tools & Technologies Mentioned

  • LangChain & LlamaIndex
  • Chroma DB, Pinecone, Weaviate
  • OpenAI Embeddings
  • NumPy, Beautiful Soup
  • Meta Llama, Google Gemini, Anthropic Claude, OpenAI GPT

Book Reference

Unlocking Data with Generative AI and RAG (2nd Edition) by Keith Bourne — available on Amazon. The book includes detailed diagrams, thorough explanations, and hands-on code labs for building production RAG systems.

Find Keith Bourne on LinkedIn.

Brought to You By

Memriq — An AI content studio building practical resources for AI practitioners. Visit Memriq.ai for more engineering-focused AI content.

  continue reading

2 episodes

Artwork
iconShare
 
Manage episode 523808526 series 3705596
Content provided by Keith Bourne. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Keith Bourne or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode, we take a deep technical dive into Retrieval-Augmented Generation (RAG), drawing heavily from Keith Bourne's book Unlocking Data with Generative AI and RAG. We explore why RAG has become indispensable for enterprise AI systems, break down the core architecture, and share practical implementation guidance for engineers building production-grade pipelines.

What We Cover

The Problem RAG Solves

No matter how advanced LLMs become—GPT, Llama, Gemini, Claude—they fundamentally lack access to your private, proprietary, or real-time data. RAG bridges this gap by combining LLM reasoning with dynamic retrieval of relevant information.

Why RAG Is Exploding Now

  • Context windows have grown dramatically (Llama 4 Scout handles up to 10M tokens)
  • The ecosystem has matured—LangChain alone hit 70M monthly downloads in May 2025
  • Infrastructure for vector storage and retrieval is production-ready

The Three-Stage Architecture

  1. Indexing: Convert documents into vector embeddings and store in a vector database
  2. Retrieval: Embed user queries and perform similarity search to find relevant chunks
  3. Generation: Feed retrieved context into an LLM prompt to generate grounded responses

RAG vs. Fine-Tuning

We compare trade-offs between augmenting at inference time versus modifying model weights, and discuss hybrid approaches that combine both.

Implementation Deep Dive

  • Data ingestion and preprocessing strategies
  • Chunking with RecursiveCharacterTextSplitter (1,000 tokens, 200 overlap)
  • Embedding models and vector databases (Chroma DB, Pinecone, Weaviate)
  • Pipeline orchestration with LangChain Expression Language (LCEL)
  • Source citation patterns for compliance and auditability

Real-World Applications

Customer support chatbots, financial advisory systems, healthcare recommendations, ecommerce personalization, and internal knowledge bases.

Open Challenges

  • "Lost in the middle" effect with long contexts
  • Multiple needles problem
  • Hallucination verification
  • Unstructured data preprocessing complexity

Tools & Technologies Mentioned

  • LangChain & LlamaIndex
  • Chroma DB, Pinecone, Weaviate
  • OpenAI Embeddings
  • NumPy, Beautiful Soup
  • Meta Llama, Google Gemini, Anthropic Claude, OpenAI GPT

Book Reference

Unlocking Data with Generative AI and RAG (2nd Edition) by Keith Bourne — available on Amazon. The book includes detailed diagrams, thorough explanations, and hands-on code labs for building production RAG systems.

Find Keith Bourne on LinkedIn.

Brought to You By

Memriq — An AI content studio building practical resources for AI practitioners. Visit Memriq.ai for more engineering-focused AI content.

  continue reading

2 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play