Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by LessWrong. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by LessWrong or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

“An Ambitious Vision for Interpretability” by leogao

8:49
 
Share
 

Manage episode 523004545 series 3364758
Content provided by LessWrong. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by LessWrong or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
The goal of ambitious mechanistic interpretability (AMI) is to fully understand how neural networks work. While some have pivoted towards more pragmatic approaches, I think the reports of AMI's death have been greatly exaggerated. The field of AMI has made plenty of progress towards finding increasingly simple and rigorously-faithful circuits, including our latest work on circuit sparsity. There are also many exciting inroads on the core problem waiting to be explored.
The value of understanding
Why try to understand things, if we can get more immediate value from less ambitious approaches? In my opinion, there are two main reasons.
First, mechanistic understanding can make it much easier to figure out what's actually going on, especially when it's hard to distinguish hypotheses using external behavior (e.g if the model is scheming).
We can liken this to going from print statement debugging to using an actual debugger. Print statement debugging often requires many experiments, because each time you gain only a few bits of information which sketch a strange, confusing, and potentially misleading picture. When you start using the debugger, you suddenly notice all at once that you’re making a lot of incorrect assumptions you didn’t even realize you were [...]
---
Outline:
(00:38) The value of understanding
(02:32) AMI has good feedback loops
(04:48) The past and future of AMI
The original text contained 1 footnote which was omitted from this narration.
---
First published:
December 5th, 2025
Source:
https://www.lesswrong.com/posts/Hy6PX43HGgmfiTaKu/an-ambitious-vision-for-interpretability
---
Narrated by TYPE III AUDIO.
---
Images from the article:
A typical debugging session.Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  continue reading

701 episodes

Artwork
iconShare
 
Manage episode 523004545 series 3364758
Content provided by LessWrong. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by LessWrong or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
The goal of ambitious mechanistic interpretability (AMI) is to fully understand how neural networks work. While some have pivoted towards more pragmatic approaches, I think the reports of AMI's death have been greatly exaggerated. The field of AMI has made plenty of progress towards finding increasingly simple and rigorously-faithful circuits, including our latest work on circuit sparsity. There are also many exciting inroads on the core problem waiting to be explored.
The value of understanding
Why try to understand things, if we can get more immediate value from less ambitious approaches? In my opinion, there are two main reasons.
First, mechanistic understanding can make it much easier to figure out what's actually going on, especially when it's hard to distinguish hypotheses using external behavior (e.g if the model is scheming).
We can liken this to going from print statement debugging to using an actual debugger. Print statement debugging often requires many experiments, because each time you gain only a few bits of information which sketch a strange, confusing, and potentially misleading picture. When you start using the debugger, you suddenly notice all at once that you’re making a lot of incorrect assumptions you didn’t even realize you were [...]
---
Outline:
(00:38) The value of understanding
(02:32) AMI has good feedback loops
(04:48) The past and future of AMI
The original text contained 1 footnote which was omitted from this narration.
---
First published:
December 5th, 2025
Source:
https://www.lesswrong.com/posts/Hy6PX43HGgmfiTaKu/an-ambitious-vision-for-interpretability
---
Narrated by TYPE III AUDIO.
---
Images from the article:
A typical debugging session.Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  continue reading

701 episodes

모든 에피소드

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play