Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Roger Basler de Roca. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Roger Basler de Roca or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

The Illusion of Thinking: Decoding AI's Reasoning Limits

15:38
 
Share
 

Manage episode 502137631 series 3153807
Content provided by Roger Basler de Roca. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Roger Basler de Roca or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode, we enter the world of Large Reasoning Models (LRMs).

We explore advanced AI systems such as OpenAI’s o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking—models that generate detailed "thinking processes" (Chain-of-Thought, CoT) with built-in self-reflection before answering.

These systems promise a new era of problem-solving. Yet, their true capabilities, scaling behavior, and limitations remain only partially understood.

By conducting systematic investigations in controlled puzzle environments—including the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World—we uncover both the strengths and surprising weaknesses of LRMs.

These environments allow precise control over task complexity while avoiding data contamination issues that often plague established benchmarks in mathematics and coding.

A striking finding: LRMs face a complete accuracy collapse beyond certain complexity thresholds. Paradoxically, their reasoning effort (measured in "thinking tokens") first increases with complexity, only to decline after a point—even when token budgets are sufficient.

We identify three distinct performance regimes:

  • Low-complexity tasks – where standard Large Language Models (LLMs) still outperform LRMs.

  • Medium-complexity tasks – where LRMs’ additional "thinking" shows a clear advantage.

  • High-complexity tasks – where both LLMs and LRMs collapse entirely.

Another challenge is “overthinking.” On simpler problems, LRMs often find correct solutions early but continue to pursue false alternatives, wasting computational resources. Even more surprising is their weakness in exact computation: they fail to leverage explicit algorithms, even when provided, and show inconsistent reasoning across different puzzle types.

This episode invites you to rethink assumptions about AI’s capacity for generalizable reasoning. What does it truly mean for a machine to "think" under increasing complexity? And how should these insights shape the next generation of AI design and deployment?

Sources: Shojaee, P., Mirzadeh, I., Alizadeh, K., Horton, M., Bengio, S., & Farajtabar, M. (2025). The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity. (Unpublished manuscript). https://arxiv.org/abs/2506.06941

Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.

⁠https://rogerbasler.ch/en/contact/

  continue reading

52 episodes

Artwork
iconShare
 
Manage episode 502137631 series 3153807
Content provided by Roger Basler de Roca. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Roger Basler de Roca or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode, we enter the world of Large Reasoning Models (LRMs).

We explore advanced AI systems such as OpenAI’s o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking—models that generate detailed "thinking processes" (Chain-of-Thought, CoT) with built-in self-reflection before answering.

These systems promise a new era of problem-solving. Yet, their true capabilities, scaling behavior, and limitations remain only partially understood.

By conducting systematic investigations in controlled puzzle environments—including the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World—we uncover both the strengths and surprising weaknesses of LRMs.

These environments allow precise control over task complexity while avoiding data contamination issues that often plague established benchmarks in mathematics and coding.

A striking finding: LRMs face a complete accuracy collapse beyond certain complexity thresholds. Paradoxically, their reasoning effort (measured in "thinking tokens") first increases with complexity, only to decline after a point—even when token budgets are sufficient.

We identify three distinct performance regimes:

  • Low-complexity tasks – where standard Large Language Models (LLMs) still outperform LRMs.

  • Medium-complexity tasks – where LRMs’ additional "thinking" shows a clear advantage.

  • High-complexity tasks – where both LLMs and LRMs collapse entirely.

Another challenge is “overthinking.” On simpler problems, LRMs often find correct solutions early but continue to pursue false alternatives, wasting computational resources. Even more surprising is their weakness in exact computation: they fail to leverage explicit algorithms, even when provided, and show inconsistent reasoning across different puzzle types.

This episode invites you to rethink assumptions about AI’s capacity for generalizable reasoning. What does it truly mean for a machine to "think" under increasing complexity? And how should these insights shape the next generation of AI design and deployment?

Sources: Shojaee, P., Mirzadeh, I., Alizadeh, K., Horton, M., Bengio, S., & Farajtabar, M. (2025). The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity. (Unpublished manuscript). https://arxiv.org/abs/2506.06941

Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.

⁠https://rogerbasler.ch/en/contact/

  continue reading

52 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play