What We Want
Manage episode 518657307 series 3687206
They listen to us. They give us smart-sounding answers. A lot of the time, they tell us what we want to hear.
Large language models are trained to respond to our preferences. It sounds logical enough in theory, but it turns out to spiral in strange and unexpected directions in practice, from AI-induced psychosis in humans to manipulation and power-seeking on the part of the AIs.
In this episode, hear from Ihor Kendukhov from SPAR (Supervised Program for Alignment Research) about why he changed his career to work on AI safety, and some of the current approaches in understanding what it is that LLMs might want themselves.
7 episodes