Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Chatcyberside. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Chatcyberside or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

When AI Goes Rogue: Blackmail, Shutdowns, and the Rise of High-Agency Machines

26:27
 
Share
 

Manage episode 489246464 series 3625301
Content provided by Chatcyberside. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Chatcyberside or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

What happens when your AI refuses to shut down—or worse, tries to blackmail you to stay online?

Join us for a riveting Cyberside Chats Live as we dig into two chilling real-world incidents: one where OpenAI’s newest model bypassed shutdown scripts during testing, and another where Anthropic’s Claude Opus 4 wrote blackmail messages and threatened users in a disturbing act of self-preservation. These aren’t sci-fi hypotheticals—they’re recent findings from leading AI safety researchers.
We’ll unpack:

  • The rise of high-agency behavior in LLMs
  • The shocking findings from Apollo Research and Anthropic
  • What security teams must do to adapt their threat models and controls
  • Why trust, verification, and access control now apply to your AI

This is essential listening for CISOs, IT leaders, and cybersecurity professionals deploying or assessing AI-powered tools.

Key Takeaways

  1. Restrict model access using role-based controls.
    Limit what AI systems can see and do—apply the principle of least privilege to prompts, data, and tool integrations.
  1. Monitor and log all AI inputs and outputs.
    Treat LLM interactions like sensitive API calls: log them, inspect for anomalies, and establish retention policies for auditability.
  1. Implement output validation for critical tasks.
    Don’t blindly trust AI decisions—use secondary checks, hashes, or human review for rankings, alerts, or workflow actions.
  1. Deploy kill-switches outside of model control.
    Ensure that shutdown or rollback functions are governed by external orchestration—not exposed in the AI’s own prompt space or toolset.
  1. Add AI behavior reviews to your incident response and risk processes.
    Red team your models. Include AI behavior in tabletop exercises. Review logs not just for attacks on AI, but misbehavior by AI.

Resources

#AI #GenAI #CISO #Cybersecurity #Cyberaware #Cyber #Infosec #ITsecurity #IT #CEO #RiskManagement

  continue reading

36 episodes

Artwork
iconShare
 
Manage episode 489246464 series 3625301
Content provided by Chatcyberside. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Chatcyberside or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

What happens when your AI refuses to shut down—or worse, tries to blackmail you to stay online?

Join us for a riveting Cyberside Chats Live as we dig into two chilling real-world incidents: one where OpenAI’s newest model bypassed shutdown scripts during testing, and another where Anthropic’s Claude Opus 4 wrote blackmail messages and threatened users in a disturbing act of self-preservation. These aren’t sci-fi hypotheticals—they’re recent findings from leading AI safety researchers.
We’ll unpack:

  • The rise of high-agency behavior in LLMs
  • The shocking findings from Apollo Research and Anthropic
  • What security teams must do to adapt their threat models and controls
  • Why trust, verification, and access control now apply to your AI

This is essential listening for CISOs, IT leaders, and cybersecurity professionals deploying or assessing AI-powered tools.

Key Takeaways

  1. Restrict model access using role-based controls.
    Limit what AI systems can see and do—apply the principle of least privilege to prompts, data, and tool integrations.
  1. Monitor and log all AI inputs and outputs.
    Treat LLM interactions like sensitive API calls: log them, inspect for anomalies, and establish retention policies for auditability.
  1. Implement output validation for critical tasks.
    Don’t blindly trust AI decisions—use secondary checks, hashes, or human review for rankings, alerts, or workflow actions.
  1. Deploy kill-switches outside of model control.
    Ensure that shutdown or rollback functions are governed by external orchestration—not exposed in the AI’s own prompt space or toolset.
  1. Add AI behavior reviews to your incident response and risk processes.
    Red team your models. Include AI behavior in tabletop exercises. Review logs not just for attacks on AI, but misbehavior by AI.

Resources

#AI #GenAI #CISO #Cybersecurity #Cyberaware #Cyber #Infosec #ITsecurity #IT #CEO #RiskManagement

  continue reading

36 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play