Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Craig S. Smith. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Craig S. Smith or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

#298 Ryan Kolln: How Appen Trains the World's Most Powerful AI Models

51:01
 
Share
 

Manage episode 517997806 series 2455219
Content provided by Craig S. Smith. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Craig S. Smith or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

This episode is sponsored by AGNTCY. Unlock agents at scale with an open Internet of Agents.

Visit https://agntcy.org/ and add your support.

How do the world's most powerful AI models get trained and trusted at scale, and what does that really take from data to deployment?

In this episode, Appen CEO Ryan Kolln joins Eye on AI to unpack how rigorous human evaluation, culturally aware data, and model-based judges come together to raise real-world performance.

In this episode of Eye on AI, host Craig Smith speaks with Ryan Kolln, CEO of Appen, about building evaluation systems that go beyond static benchmarks to measure usefulness, safety, and reliability in production. They explore how human raters and AI evaluators work in tandem, why localization matters across regions and domains, and how quality controls keep feedback signals trustworthy for training and post-training.

Ryan explains how evaluation feeds reinforcement strategies, where rubric-driven human judgments inform reward models, and how enterprises can stand up secure workflows for sensitive use cases. He also discusses emerging needs around sovereign models, domain-specific testing, and the shift from general chat to agentic workflows that operate inside real business systems.

Learn how leading teams design human-in-the-loop evaluation, when to route judgments from models back to expert reviewers, how to capture cultural nuance without losing universal guardrails, and how to build an evaluation stack that scales from early prototypes to production AI.

Stay Updated: Craig Smith on X: https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI

  continue reading

307 episodes

Artwork
iconShare
 
Manage episode 517997806 series 2455219
Content provided by Craig S. Smith. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Craig S. Smith or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

This episode is sponsored by AGNTCY. Unlock agents at scale with an open Internet of Agents.

Visit https://agntcy.org/ and add your support.

How do the world's most powerful AI models get trained and trusted at scale, and what does that really take from data to deployment?

In this episode, Appen CEO Ryan Kolln joins Eye on AI to unpack how rigorous human evaluation, culturally aware data, and model-based judges come together to raise real-world performance.

In this episode of Eye on AI, host Craig Smith speaks with Ryan Kolln, CEO of Appen, about building evaluation systems that go beyond static benchmarks to measure usefulness, safety, and reliability in production. They explore how human raters and AI evaluators work in tandem, why localization matters across regions and domains, and how quality controls keep feedback signals trustworthy for training and post-training.

Ryan explains how evaluation feeds reinforcement strategies, where rubric-driven human judgments inform reward models, and how enterprises can stand up secure workflows for sensitive use cases. He also discusses emerging needs around sovereign models, domain-specific testing, and the shift from general chat to agentic workflows that operate inside real business systems.

Learn how leading teams design human-in-the-loop evaluation, when to route judgments from models back to expert reviewers, how to capture cultural nuance without losing universal guardrails, and how to build an evaluation stack that scales from early prototypes to production AI.

Stay Updated: Craig Smith on X: https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI

  continue reading

307 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play