Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Mike Breault. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Mike Breault or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

TD-Gammon: Self-Taught Reinforcement Learning and the Backgammon Breakthrough

5:55
 
Share
 

Manage episode 513457703 series 3690682
Content provided by Mike Breault. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Mike Breault or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Gerald Tesoro’s TD-Gammon (early 1990s, IBM) proved that reinforcement learning could reach world-class backgammon by learning from self‑play alone. A small neural network used temporal-difference learning to bootstrap its way toward better play, training on roughly 1.5 million self‑played games with a 3-layer architecture (198 inputs, ~80–160 hidden units, 4 outputs predicting White/Black win with or without a gammon). It barely lost to top players and, in doing so, shifted human strategy (notably the 2-1 opening) and helped spark modern RL breakthroughs that culminated in Deep Q‑Networks and AlphaGo/AlphaZero. The TD error signal also draws a provocative parallel to dopamine-based learning in the brain, suggesting universal principles behind intelligence that transcend systems.

Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC

  continue reading

1368 episodes

Artwork
iconShare
 
Manage episode 513457703 series 3690682
Content provided by Mike Breault. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Mike Breault or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Gerald Tesoro’s TD-Gammon (early 1990s, IBM) proved that reinforcement learning could reach world-class backgammon by learning from self‑play alone. A small neural network used temporal-difference learning to bootstrap its way toward better play, training on roughly 1.5 million self‑played games with a 3-layer architecture (198 inputs, ~80–160 hidden units, 4 outputs predicting White/Black win with or without a gammon). It barely lost to top players and, in doing so, shifted human strategy (notably the 2-1 opening) and helped spark modern RL breakthroughs that culminated in Deep Q‑Networks and AlphaGo/AlphaZero. The TD error signal also draws a provocative parallel to dopamine-based learning in the brain, suggesting universal principles behind intelligence that transcend systems.

Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC

  continue reading

1368 episodes

所有剧集

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play