Dreaming Or Scheming? The Glitchatorio podcast

10d ago 30:13

Content provided by Witch of Glitch. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Witch of Glitch or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Since AI models are built on artificial neural networks, what parallels can we draw with human brain "wiring"?

In this episode, the Witch of Glitch interviews neuroscientist and psychology researcher Scott Blain about the pitfalls of pattern recognition, as well as that grey area where agreeableness shades into sycophancy.

We then dig into the big unknowns about AI self-awareness and its capacity to deceive or manipulate humans. For more details about the "blackmail experiment" we discuss, see the paper from Anthropic called "Agentic Misalignment: How LLMs could be insider threats". Finally, if you're not familiar with the following terms, here are some quick definitions:

RLHF - reinforcement learning from human feedback. This is a technique where people "teach" AI models which answers it should provide by rewarding the correct ones.
Mechanistic interpretability - a kind of "reverse engineering" of AI models that seeks to understand their outputs by investigating the activity of their neural networks.

6 episodes

Since AI models are built on artificial neural networks, what parallels can we draw with human brain "wiring"?

RLHF - reinforcement learning from human feedback. This is a technique where people "teach" AI models which answers it should provide by rewarding the correct ones.
Mechanistic interpretability - a kind of "reverse engineering" of AI models that seeks to understand their outputs by investigating the activity of their neural networks.

Podcasts Worth a Listen

The Glitchatorio »
Dreaming or Scheming?