Complete Beginner's Course On AI Evaluations: Step By Step (2025)

Content provided by Peter Yang. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Peter Yang or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Behind the Craft « »
Complete Beginner's Course on AI Evaluations: Step by Step (2025) | Aman Khan

4M ago 51:47

MP3•Episode home

Today, I want to share a new episode with Aman Khan.The best way to learn about AI evaluations is to watch 2 PMs build them live from scratch. In our new episode, Aman and I walk through creating evals for an AI customer support agent — from labeling a golden dataset to aligning LLM judges. This is the complete beginners AI eval course you've been waiting for.Aman and I talked about:

(00:00) What are AI evals and how to get good at them

(02:52) The 4 types of AI evaluations everyone should know

(06:08) Live demo: Building evals for a customer support agent

(10:29) Using Anthropic's console to generate great prompts

(15:13) Creating the evaluation criteria

(17:40) Adding human labels to the golden dataset

(31:05) Scaling evals with LLM-judge prompts

(38:21) How to align LLM judges with human judgmentGet the takeaways: https://creatoreconomy.so/p/complete-beginner-course-on-ai-evaluations-aman-khanWhere to find Aman:

X: https://www.linkedin.com/in/amanberkeley/

Website: https://arize.com/📌 Subscribe to this channel – more interviews coming soon!

86 episodes

#Tech #Peter Yang