Shining a Light on the AI Black Box: Chain of Thought and Monitorability
Manage episode 525246903 series 3690682
We explore how monitoring AI reasoning can reveal safety signals in critical decisions. Learn what monitorability means, why a perfect transcript isn’t required, and how robust metrics and three evaluation modes—intervention, process, and outcome—help catch red flags. The episode covers why bigger models aren’t necessarily less transparent, the surprising role of compute and RL, and practical tips like the monitorability tax and targeted follow-ups.
Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.
Sponsored by Embersilk LLC
1612 episodes