Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Keith Bourne. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Keith Bourne or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

RAG & Reference-Free Evaluation: Scaling LLM Quality Without Ground Truth

23:39
 
Share
 

Manage episode 524196940 series 3705593
Content provided by Keith Bourne. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Keith Bourne or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode of Memriq Inference Digest - Leadership Edition, we explore how Retrieval-Augmented Generation (RAG) systems maintain quality and trust at scale through advanced evaluation methods. Join Morgan, Casey, and special guest Keith Bourne as they unpack the game-changing RAGAS framework and the emerging practice of reference-free evaluation that enables AI to self-verify without costly human labeling.

In this episode:

- Understand the limitations of traditional evaluation metrics and why RAG demands new approaches

- Discover how RAGAS breaks down AI answers into atomic fact checks using large language models

- Hear insights from Keith Bourne’s interview with Shahul Es, co-founder of RAGAS

- Compare popular evaluation tools: RAGAS, DeepEval, and TruLens, and learn when to use each

- Explore real-world enterprise adoption and integration strategies

- Discuss challenges like LLM bias, domain expertise gaps, and multi-hop reasoning evaluation

Key tools and technologies mentioned:

- RAGAS (Retrieval Augmented Generation Assessment System)

- DeepEval

- TruLens

- LangSmith

- LlamaIndex

- LangFuse

- Arize Phoenix

Timestamps:

0:00 - Introduction and episode overview

2:30 - What is Retrieval-Augmented Generation (RAG)?

5:15 - Why traditional metrics fall short for RAG evaluation

7:45 - RAGAS framework and reference-free evaluation explained

11:00 - Interview highlights with Shahul Es, CTO of RAGAS

13:30 - Comparing RAGAS, DeepEval, and TruLens tools

16:00 - Enterprise use cases and integration patterns

18:30 - Challenges and limitations of LLM self-evaluation

20:00 - Closing thoughts and next steps

Resources:

- "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

- Visit Memriq AI at https://Memriq.ai for more AI engineering deep-dives, guides, and research breakdowns

Thanks for tuning in to Memriq AI Inference Digest - Leadership Edition. Stay ahead in AI leadership by integrating continuous evaluation into your AI product strategy.

  continue reading

22 episodes

Artwork
iconShare
 
Manage episode 524196940 series 3705593
Content provided by Keith Bourne. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Keith Bourne or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode of Memriq Inference Digest - Leadership Edition, we explore how Retrieval-Augmented Generation (RAG) systems maintain quality and trust at scale through advanced evaluation methods. Join Morgan, Casey, and special guest Keith Bourne as they unpack the game-changing RAGAS framework and the emerging practice of reference-free evaluation that enables AI to self-verify without costly human labeling.

In this episode:

- Understand the limitations of traditional evaluation metrics and why RAG demands new approaches

- Discover how RAGAS breaks down AI answers into atomic fact checks using large language models

- Hear insights from Keith Bourne’s interview with Shahul Es, co-founder of RAGAS

- Compare popular evaluation tools: RAGAS, DeepEval, and TruLens, and learn when to use each

- Explore real-world enterprise adoption and integration strategies

- Discuss challenges like LLM bias, domain expertise gaps, and multi-hop reasoning evaluation

Key tools and technologies mentioned:

- RAGAS (Retrieval Augmented Generation Assessment System)

- DeepEval

- TruLens

- LangSmith

- LlamaIndex

- LangFuse

- Arize Phoenix

Timestamps:

0:00 - Introduction and episode overview

2:30 - What is Retrieval-Augmented Generation (RAG)?

5:15 - Why traditional metrics fall short for RAG evaluation

7:45 - RAGAS framework and reference-free evaluation explained

11:00 - Interview highlights with Shahul Es, CTO of RAGAS

13:30 - Comparing RAGAS, DeepEval, and TruLens tools

16:00 - Enterprise use cases and integration patterns

18:30 - Challenges and limitations of LLM self-evaluation

20:00 - Closing thoughts and next steps

Resources:

- "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

- Visit Memriq AI at https://Memriq.ai for more AI engineering deep-dives, guides, and research breakdowns

Thanks for tuning in to Memriq AI Inference Digest - Leadership Edition. Stay ahead in AI leadership by integrating continuous evaluation into your AI product strategy.

  continue reading

22 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play