20250608 - ‘Reasoning’ AIs fail hard if they actually have to think
Manage episode 487578702 series 3662020
Anything outside their training.
Patreon: https://www.patreon.com/davidgerard
Ko-Fi: https://ko-fi.com/A1529D5
Buy me nice things: https://www.amazon.co.uk/hz/wishlist/ls/3Q8VZW46J6DM6
Get an extremely cool Pivot to AI shirt or mug: https://pivot-to-ai.redbubble.com
Sources:
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity (PDF) https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models (PDF) https://arxiv.org/pdf/2410.05229
Stop Anthropomorphizing Intermediate Tokens As Reasoning/Thinking Traces! (PDF) https://arxiv.org/pdf/2504.09762
Previously on Pivot to AI:
LLMs can solve any word problem! As long as they can crib the answer https://pivot-to-ai.com/2024/07/06/llms-can-solve-any-word-problem-as-long-as-they-can-crib-the-answer/
AI benchmarks are self-promoting trash — but regulators keep using them https://pivot-to-ai.com/2025/02/25/ai-benchmarks-are-self-promoting-trash-but-regulators-keep-using-them/
‘Reasoning’ AI is LYING to you! — or maybe it’s just hallucinating again https://pivot-to-ai.com/2025/04/18/reasoning-ai-is-lying-to-you-or-maybe-its-just-hallucinating-again/
video: https://www.youtube.com/watch?v=dNT0LcCqtss&list=UU9rJrMVgcXTfa8xuMnbhAEA
Full Pivot to AI playlist: https://www.youtube.com/playlist?list=UU9rJrMVgcXTfa8xuMnbhAEA
74 episodes