Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Witch of Glitch. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Witch of Glitch or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Dreaming or Scheming?

30:13
 
Share
 

Manage episode 515846060 series 3687206
Content provided by Witch of Glitch. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Witch of Glitch or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Since AI models are built on artificial neural networks, what parallels can we draw with human brain "wiring"?

In this episode, the Witch of Glitch interviews neuroscientist and psychology researcher Scott Blain about the pitfalls of pattern recognition, as well as that grey area where agreeableness shades into sycophancy.

We then dig into the big unknowns about AI self-awareness and its capacity to deceive or manipulate humans. For more details about the "blackmail experiment" we discuss, see the paper from Anthropic called "Agentic Misalignment: How LLMs could be insider threats". Finally, if you're not familiar with the following terms, here are some quick definitions:

  • RLHF - reinforcement learning from human feedback. This is a technique where people "teach" AI models which answers it should provide by rewarding the correct ones.
  • Mechanistic interpretability - a kind of "reverse engineering" of AI models that seeks to understand their outputs by investigating the activity of their neural networks.

  continue reading

6 episodes

Artwork
iconShare
 
Manage episode 515846060 series 3687206
Content provided by Witch of Glitch. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Witch of Glitch or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Since AI models are built on artificial neural networks, what parallels can we draw with human brain "wiring"?

In this episode, the Witch of Glitch interviews neuroscientist and psychology researcher Scott Blain about the pitfalls of pattern recognition, as well as that grey area where agreeableness shades into sycophancy.

We then dig into the big unknowns about AI self-awareness and its capacity to deceive or manipulate humans. For more details about the "blackmail experiment" we discuss, see the paper from Anthropic called "Agentic Misalignment: How LLMs could be insider threats". Finally, if you're not familiar with the following terms, here are some quick definitions:

  • RLHF - reinforcement learning from human feedback. This is a technique where people "teach" AI models which answers it should provide by rewarding the correct ones.
  • Mechanistic interpretability - a kind of "reverse engineering" of AI models that seeks to understand their outputs by investigating the activity of their neural networks.

  continue reading

6 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play