Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Jacob Haimes and Igor Krawczuk. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Jacob Haimes and Igor Krawczuk or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

AI, Reasoning or Rambling?

1:11:08
 
Share
 

Manage episode 494479413 series 3602894
Content provided by Jacob Haimes and Igor Krawczuk. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Jacob Haimes and Igor Krawczuk or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode, we redefine AI's "reasoning" as mere rambling, exposing the "illusion of thinking" and "Potemkin understanding" in current models. We contrast the classical definition of reasoning (requiring logic and consistency) with Big Tech's new version, which is a generic statement about information processing. We explain how Large Rambling Models generate extensive, often irrelevant, rambling traces that appear to improve benchmarks, largely due to best-of-N sampling and benchmark gaming.

Words and definitions actually matter! Carelessness leads to misplaced investments and an overestimation of systems that are currently just surprisingly useful autocorrects.

  • (00:00) - Intro
  • (00:40) - OBB update and Meta's talent acquisition
  • (03:09) - What are rambling models?
  • (04:25) - Definitions and polarization
  • (09:50) - Logic and consistency
  • (17:00) - Why does this matter?
  • (21:40) - More likely explanations
  • (35:05) - The "illusion of thinking" and task complexity
  • (39:07) - "Potemkin understanding" and surface-level recall
  • (50:00) - Benchmark gaming and best-of-n sampling
  • (55:40) - Costs and limitations
  • (58:24) - Claude's anecdote and the Vending Bench
  • (01:03:05) - Definitional switch and implications
  • (01:10:18) - Outro

Links
  • Apple paper - The Illusion of Thinking
  • ICML 2025 paper - Potemkin Understanding in Large Language Models
  • Preprint - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Theoretical understanding

  • Max M. Schlereth Manuscript - The limits of AGI part II
  • Preprint - (How) Do Reasoning Models Reason?
  • Preprint - A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
  • NeurIPS 2024 paper - How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad

Empirical explanations

  • Preprint - How Do Large Language Monkeys Get Their Power (Laws)?
  • Andon Labs Preprint - Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents
  • LeapLab, Tsinghua University and Shanghai Jiao Tong University paper - Does Reinforcement Learning Really Incentivize Reasoning Capacity
  • Preprint - RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
  • Preprint - Mind The Gap: Deep Learning Doesn't Learn Deeply
  • Preprint - Measuring AI Ability to Complete Long Tasks
  • Preprint - GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Other sources

  • Zuck's Haul webpage - Meta's talent acquisition tracker
    • Hacker News discussion - Opinions from the AI community
  • Interconnects blogpost - The rise of reasoning machines
  • Anthropic blog - Project Vend: Can Claude run a small shop?
  continue reading

16 episodes

Artwork
iconShare
 
Manage episode 494479413 series 3602894
Content provided by Jacob Haimes and Igor Krawczuk. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Jacob Haimes and Igor Krawczuk or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode, we redefine AI's "reasoning" as mere rambling, exposing the "illusion of thinking" and "Potemkin understanding" in current models. We contrast the classical definition of reasoning (requiring logic and consistency) with Big Tech's new version, which is a generic statement about information processing. We explain how Large Rambling Models generate extensive, often irrelevant, rambling traces that appear to improve benchmarks, largely due to best-of-N sampling and benchmark gaming.

Words and definitions actually matter! Carelessness leads to misplaced investments and an overestimation of systems that are currently just surprisingly useful autocorrects.

  • (00:00) - Intro
  • (00:40) - OBB update and Meta's talent acquisition
  • (03:09) - What are rambling models?
  • (04:25) - Definitions and polarization
  • (09:50) - Logic and consistency
  • (17:00) - Why does this matter?
  • (21:40) - More likely explanations
  • (35:05) - The "illusion of thinking" and task complexity
  • (39:07) - "Potemkin understanding" and surface-level recall
  • (50:00) - Benchmark gaming and best-of-n sampling
  • (55:40) - Costs and limitations
  • (58:24) - Claude's anecdote and the Vending Bench
  • (01:03:05) - Definitional switch and implications
  • (01:10:18) - Outro

Links
  • Apple paper - The Illusion of Thinking
  • ICML 2025 paper - Potemkin Understanding in Large Language Models
  • Preprint - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Theoretical understanding

  • Max M. Schlereth Manuscript - The limits of AGI part II
  • Preprint - (How) Do Reasoning Models Reason?
  • Preprint - A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
  • NeurIPS 2024 paper - How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad

Empirical explanations

  • Preprint - How Do Large Language Monkeys Get Their Power (Laws)?
  • Andon Labs Preprint - Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents
  • LeapLab, Tsinghua University and Shanghai Jiao Tong University paper - Does Reinforcement Learning Really Incentivize Reasoning Capacity
  • Preprint - RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
  • Preprint - Mind The Gap: Deep Learning Doesn't Learn Deeply
  • Preprint - Measuring AI Ability to Complete Long Tasks
  • Preprint - GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Other sources

  • Zuck's Haul webpage - Meta's talent acquisition tracker
    • Hacker News discussion - Opinions from the AI community
  • Interconnects blogpost - The rise of reasoning machines
  • Anthropic blog - Project Vend: Can Claude run a small shop?
  continue reading

16 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play