TD-Gammon: Self-Taught Reinforcement Learning and the Backgammon Breakthrough
MP3•Episode home
Manage episode 513457703 series 3690682
Content provided by Mike Breault. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Mike Breault or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Gerald Tesoro’s TD-Gammon (early 1990s, IBM) proved that reinforcement learning could reach world-class backgammon by learning from self‑play alone. A small neural network used temporal-difference learning to bootstrap its way toward better play, training on roughly 1.5 million self‑played games with a 3-layer architecture (198 inputs, ~80–160 hidden units, 4 outputs predicting White/Black win with or without a gammon). It barely lost to top players and, in doing so, shifted human strategy (notably the 2-1 opening) and helped spark modern RL breakthroughs that culminated in Deep Q‑Networks and AlphaGo/AlphaZero. The TD error signal also draws a provocative parallel to dopamine-based learning in the brain, suggesting universal principles behind intelligence that transcend systems.
…
continue reading
Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.
Sponsored by Embersilk LLC
1368 episodes