Go offline with the Player FM app!
Complete Beginner's Course on AI Evaluations: Step by Step (2025) | Aman Khan
Manage episode 502161037 series 3621237
Today, I want to share a new episode with Aman Khan.The best way to learn about AI evaluations is to watch 2 PMs build them live from scratch. In our new episode, Aman and I walk through creating evals for an AI customer support agent — from labeling a golden dataset to aligning LLM judges. This is the complete beginners AI eval course you've been waiting for.Aman and I talked about:
(00:00) What are AI evals and how to get good at them
(02:52) The 4 types of AI evaluations everyone should know
(06:08) Live demo: Building evals for a customer support agent
(10:29) Using Anthropic's console to generate great prompts
(15:13) Creating the evaluation criteria
(17:40) Adding human labels to the golden dataset
(31:05) Scaling evals with LLM-judge prompts
(38:21) How to align LLM judges with human judgmentGet the takeaways: https://creatoreconomy.so/p/complete-beginner-course-on-ai-evaluations-aman-khanWhere to find Aman:
X: https://www.linkedin.com/in/amanberkeley/
Website: https://arize.com/📌 Subscribe to this channel – more interviews coming soon!
71 episodes
Manage episode 502161037 series 3621237
Today, I want to share a new episode with Aman Khan.The best way to learn about AI evaluations is to watch 2 PMs build them live from scratch. In our new episode, Aman and I walk through creating evals for an AI customer support agent — from labeling a golden dataset to aligning LLM judges. This is the complete beginners AI eval course you've been waiting for.Aman and I talked about:
(00:00) What are AI evals and how to get good at them
(02:52) The 4 types of AI evaluations everyone should know
(06:08) Live demo: Building evals for a customer support agent
(10:29) Using Anthropic's console to generate great prompts
(15:13) Creating the evaluation criteria
(17:40) Adding human labels to the golden dataset
(31:05) Scaling evals with LLM-judge prompts
(38:21) How to align LLM judges with human judgmentGet the takeaways: https://creatoreconomy.so/p/complete-beginner-course-on-ai-evaluations-aman-khanWhere to find Aman:
X: https://www.linkedin.com/in/amanberkeley/
Website: https://arize.com/📌 Subscribe to this channel – more interviews coming soon!
71 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.