Go offline with the Player FM app!
Beyond Guardrails: Defending LLMs Against Sophisticated Attacks
Manage episode 484162025 series 2570898
Jason Martin is an AI Security Researcher at HiddenLayer. This episode explores “policy puppetry,” a universal attack technique bypassing safety features in all major language models using structured formats like XML or JSON.
Subscribe to the Gradient Flow Newsletter 📩 https://gradientflow.substack.com/
Subscribe: Apple · Spotify · Overcast · Pocket Casts · AntennaPod · Podcast Addict · Amazon · RSS.
Detailed show notes - with links to many references - can be found on The Data Exchange web site.
285 episodes
Manage episode 484162025 series 2570898
Jason Martin is an AI Security Researcher at HiddenLayer. This episode explores “policy puppetry,” a universal attack technique bypassing safety features in all major language models using structured formats like XML or JSON.
Subscribe to the Gradient Flow Newsletter 📩 https://gradientflow.substack.com/
Subscribe: Apple · Spotify · Overcast · Pocket Casts · AntennaPod · Podcast Addict · Amazon · RSS.
Detailed show notes - with links to many references - can be found on The Data Exchange web site.
285 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.