Go offline with the Player FM app!
Embryology of AI: How Training Data Shapes AI Development w/ Timaeus' Jesse Hoogland & Daniel Murfet
Manage episode 489557814 series 3452589
Jesse Hoogland and Daniel Murfet, founders of Timaeus, introduce their mathematically rigorous approach to AI safety through "developmental interpretability" based on Singular Learning Theory. They explain how neural network loss landscapes are actually complex, jagged surfaces full of "singularities" where models can change internally without affecting external behavior—potentially masking dangerous misalignment. Using their Local Learning Coefficient measure, they've demonstrated the ability to identify critical phase changes during training in models up to 7 billion parameters, offering a complementary approach to mechanistic interpretability. This work aims to move beyond trial-and-error neural network training toward a more principled engineering discipline that could catch safety issues during training rather than after deployment.
Sponsors:
Oracle Cloud Infrastructure: Oracle Cloud Infrastructure (OCI) is the next-generation cloud that delivers better performance, faster speeds, and significantly lower costs, including up to 50% less for compute, 70% for storage, and 80% for networking. Run any workload, from infrastructure to AI, in a high-availability environment and try OCI for free with zero commitment at https://oracle.com/cognitive
The AGNTCY: The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at https://agntcy.org/?utmcampaign=fy25q4agntcyamerpaid-mediaagntcy-cognitiverevolutionpodcast&utmchannel=podcast&utmsource=podcast
NetSuite by Oracle: NetSuite by Oracle is the AI-powered business management suite trusted by over 41,000 businesses, offering a unified platform for accounting, financial management, inventory, and HR. Gain total visibility and control to make quick decisions and automate everyday tasks—download the free ebook, Navigating Global Trade: Three Insights for Leaders, at https://netsuite.com/cognitive
PRODUCED BY:
CHAPTERS:
(00:00) Teaser
(04:44) About the Episode
(09:28) Introduction and Background
(11:01) Timaeus Origins and Philosophy
(14:18) Mathematical Foundations and SLT
(17:11) Developmental Interpretability Approach (Part 1)
(20:53) Sponsors: Oracle Cloud Infrastructure | The AGNTCY
(22:53) Developmental Interpretability Approach (Part 2)
(24:08) Proto-Paradigm and SAEs
(29:21) Generalization Theory Deep Dive
(34:59) Central Dogma Framework (Part 1)
(36:57) Sponsor: NetSuite by Oracle
(38:21) Central Dogma Framework (Part 2)
(39:19) Loss Landscape Geometry
(45:25) Degeneracies and Singularities
(52:09) Structure and Generalization
(01:00:20) Essential Dynamics Research
(01:05:04) Grokking vs Typical Learning
(01:12:03) Double Descent Discussion
(01:14:39) Interpretability and Alignment Applications
(01:22:01) Reward Hacking and Overgeneralization
(01:30:03) Future Training Vision
(01:36:20) Scaling and Compute Requirements
(01:38:19) Future Research Directions
(01:41:27) Outro
255 episodes
Embryology of AI: How Training Data Shapes AI Development w/ Timaeus' Jesse Hoogland & Daniel Murfet
"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
Manage episode 489557814 series 3452589
Jesse Hoogland and Daniel Murfet, founders of Timaeus, introduce their mathematically rigorous approach to AI safety through "developmental interpretability" based on Singular Learning Theory. They explain how neural network loss landscapes are actually complex, jagged surfaces full of "singularities" where models can change internally without affecting external behavior—potentially masking dangerous misalignment. Using their Local Learning Coefficient measure, they've demonstrated the ability to identify critical phase changes during training in models up to 7 billion parameters, offering a complementary approach to mechanistic interpretability. This work aims to move beyond trial-and-error neural network training toward a more principled engineering discipline that could catch safety issues during training rather than after deployment.
Sponsors:
Oracle Cloud Infrastructure: Oracle Cloud Infrastructure (OCI) is the next-generation cloud that delivers better performance, faster speeds, and significantly lower costs, including up to 50% less for compute, 70% for storage, and 80% for networking. Run any workload, from infrastructure to AI, in a high-availability environment and try OCI for free with zero commitment at https://oracle.com/cognitive
The AGNTCY: The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at https://agntcy.org/?utmcampaign=fy25q4agntcyamerpaid-mediaagntcy-cognitiverevolutionpodcast&utmchannel=podcast&utmsource=podcast
NetSuite by Oracle: NetSuite by Oracle is the AI-powered business management suite trusted by over 41,000 businesses, offering a unified platform for accounting, financial management, inventory, and HR. Gain total visibility and control to make quick decisions and automate everyday tasks—download the free ebook, Navigating Global Trade: Three Insights for Leaders, at https://netsuite.com/cognitive
PRODUCED BY:
CHAPTERS:
(00:00) Teaser
(04:44) About the Episode
(09:28) Introduction and Background
(11:01) Timaeus Origins and Philosophy
(14:18) Mathematical Foundations and SLT
(17:11) Developmental Interpretability Approach (Part 1)
(20:53) Sponsors: Oracle Cloud Infrastructure | The AGNTCY
(22:53) Developmental Interpretability Approach (Part 2)
(24:08) Proto-Paradigm and SAEs
(29:21) Generalization Theory Deep Dive
(34:59) Central Dogma Framework (Part 1)
(36:57) Sponsor: NetSuite by Oracle
(38:21) Central Dogma Framework (Part 2)
(39:19) Loss Landscape Geometry
(45:25) Degeneracies and Singularities
(52:09) Structure and Generalization
(01:00:20) Essential Dynamics Research
(01:05:04) Grokking vs Typical Learning
(01:12:03) Double Descent Discussion
(01:14:39) Interpretability and Alignment Applications
(01:22:01) Reward Hacking and Overgeneralization
(01:30:03) Future Training Vision
(01:36:20) Scaling and Compute Requirements
(01:38:19) Future Research Directions
(01:41:27) Outro
255 episodes
Alle episoder
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.