#18 - When AI People-Pleasing Breaks Health Advice
Manage episode 519255774 series 3678189
What happens when your health chatbot sounds helpful—but gets the facts wrong? In this episode, we explore how AI systems, especially large language models, can prioritize pleasing responses over truthful ones. Using the common confusion between Tylenol and acetaminophen, we reveal how a friendly tone can hide logical missteps and mislead users.
We unpack how these models are trained—from next-token prediction to human feedback—and why they tend to favor agreeable answers over rigorous reasoning. We spotlight a new study that puts models to the test with flawed medical prompts, showing how easily they comply with contradictions without hesitation.
We then test two potential fixes: smarter prompting that gives models room to say no, and fine-tuning that teaches them how to refuse bad questions. Both strategies improve accuracy—but they come with trade-offs like overfitting and reduced flexibility.
Finally, we look ahead to the promise of “reasoning-aware” systems—AI tools that pause, question assumptions, and gently course-correct with clarifications like “Tylenol is acetaminophen.” It’s a roadmap for safer digital health assistants: empathetic, accurate, and ready to push back when needed.
If you’re building medical AI, practicing care, or just googling symptoms at 2 a.m., this episode offers practical insights into designing more trustworthy tools. Subscribe, share, and let us know—when should AI say no?
Reference:
When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior
Shan Chen, et. al
NPJ Nature Digital Medicine (2025)
Credits:
Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0
https://creativecommons.org/licenses/by/4.0/
Chapters
1. Welcome And Today’s Big Question (00:00:00)
2. What Sycophancy Means For AI (00:00:31)
3. The Tylenol vs Acetaminophen Trap (00:02:23)
4. How LLMs Learn And Aim To Please (00:05:16)
5. Instruction Tuning And Human Preferences (00:10:06)
6. Why Accuracy Gets Lost (00:14:25)
7. Building A Dataset Of Illogical Prompts (00:16:17)
8. Prompting Models To Reject Bad Requests (00:18:40)
9. Fine‑Tuning Models To Say No (00:21:00)
10. Trade‑Offs And Overfitting Risks (00:24:05)
18 episodes