Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Stephen Auger. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Stephen Auger or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

AI Caught 'Cheating' Its Medical Exams - New Research Paper from Microsoft

5:09
 
Share
 

Manage episode 510734489 series 3678442
Content provided by Stephen Auger. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Stephen Auger or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Top AI models are acing medical benchmarks, but are they actually ready for the clinic? A groundbreaking study reveals that impressive scores can hide a dangerous lack of real-world robustness. In this episode, we break down the ingenious "stress tests" that expose how AI can succeed on an exam for all the wrong reasons—from guessing answers without seeing medical images to failing when the question format is slightly changed. Tune in to understand why we must move beyond leaderboard scores and start demanding real proof of clinical readiness.

"The Illusion of Readiness: Stress Testing Large Frontier Models on Multimodal Medical Benchmarks". Gu et al. 22 Sept 2025.

Link to the paper: https://arxiv.org/html/2509.18234v1

#Microsoft #OpenAI #Gemini #HealthAI #AIinHealthcare #DigitalHealth #MedicalAI #ClinicalAI #PatientSafety #Tech #Innovation #MachineLearning #LLM #ai in medicine Music generated by Mubert https://mubert.com/render

[email protected]

  continue reading

31 episodes

Artwork
iconShare
 
Manage episode 510734489 series 3678442
Content provided by Stephen Auger. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Stephen Auger or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Top AI models are acing medical benchmarks, but are they actually ready for the clinic? A groundbreaking study reveals that impressive scores can hide a dangerous lack of real-world robustness. In this episode, we break down the ingenious "stress tests" that expose how AI can succeed on an exam for all the wrong reasons—from guessing answers without seeing medical images to failing when the question format is slightly changed. Tune in to understand why we must move beyond leaderboard scores and start demanding real proof of clinical readiness.

"The Illusion of Readiness: Stress Testing Large Frontier Models on Multimodal Medical Benchmarks". Gu et al. 22 Sept 2025.

Link to the paper: https://arxiv.org/html/2509.18234v1

#Microsoft #OpenAI #Gemini #HealthAI #AIinHealthcare #DigitalHealth #MedicalAI #ClinicalAI #PatientSafety #Tech #Innovation #MachineLearning #LLM #ai in medicine Music generated by Mubert https://mubert.com/render

[email protected]

  continue reading

31 episodes

Tất cả các tập

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play