20250418 - ‘Reasoning’ AI is LYING to you! — or maybe it’s just hallucinating again
Manage episode 479663514 series 3662020
The chatbot is definitely trying to kill you, maybe. Send us money.
Text version: https://pivot-to-ai.com/2025/04/18/reasoning-ai-is-lying-to-you-or-maybe-its-just-hallucinating-again/
Sources:
Anthropic: Reasoning models don't always say what they think https://www.anthropic.com/research/reasoning-models-dont-say-think
paper (PDF) https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf
Introducing Transluce https://transluce.org/introducing-transluce Investigating truthfulness in a pre-release o3 model https://transluce.org/investigating-o3-truthfulness
Transluce: "These behaviors are surprising." https://x.com/TransluceAI/status/1912552068717637980
(Ars Technica article, edited) Researchers concerned to find AI models misrepresenting their “reasoning” processes https://arstechnica.com/ai/2025/04/researchers-concerned-to-find-ai-models-hiding-their-true-reasoning-processes/
(Ars Technica article, original) Researchers concerned to find AI models hiding their true “reasoning” processes https://web.archive.org/web/20250410231357/https://arstechnica.com/ai/2025/04/researchers-concerned-to-find-ai-models-hiding-their-true-reasoning-processes/
Copyscape is nice for quickly comparing web pages https://copyscape.com
Previously:
Anthropic, Apollo astounded to find a chatbot will lie to you if you tell it to lie to you https://pivot-to-ai.com/2024/12/19/anthropic-and-apollo-astounded-to-find-that-a-chatbot-will-lie-to-you-if-you-tell-it-to-lie-to-you/
How Sam Altman got fired from OpenAI in 2023: not being an AI doom crank (and lying a lot) https://pivot-to-ai.com/2025/04/06/how-sam-altman-got-fired-from-openai-in-2023-not-being-an-ai-doom-crank-and-lying-a-lot/
video: https://www.youtube.com/watch?v=xlrBjeAtJUk&list=UU9rJrMVgcXTfa8xuMnbhAEA
T-shirt store now open! https://pivot-to-ai.redbubble.com
Enhance the channel: https://www.amazon.co.uk/hz/wishlist/ls/3Q8VZW46J6DM6
Please fund my vital AI safety research! The fate of humanity is at stake!
Patreon: https://www.patreon.com/davidgerard
Ko-Fi: https://ko-fi.com/A1529D5
40 episodes