Player of Games: All the games, one algorithm! (w/ author Martin Schmid)

Player of Games: All the games, one algorithm! (w/ author Martin Schmid)

Jan 05, 2022
|
41 views
Details
#playerofgames #deepmind #alphazero Special Guest: First author Martin Schmid (https://twitter.com/Lifrordi) Games have been used throughout research as testbeds for AI algorithms, such as reinforcement learning agents. However, different types of games usually require different solution approaches, such as AlphaZero for Go or Chess, and Counterfactual Regret Minimization (CFR) for Poker. Player of Games bridges this gap between perfect and imperfect information games and delivers a single algorithm that uses tree search over public information states, and is trained via self-play. The resulting algorithm can play Go, Chess, Poker, Scotland Yard, and many more games, as well as non-game environments. OUTLINE: 0:00 - Introduction 2:50 - What games can Player of Games be trained on? 4:00 - Tree search algorithms (AlphaZero) 8:00 - What is different in imperfect information games? 15:40 - Counterfactual Value- and Policy-Networks 18:50 - The Player of Games search procedure 28:30 - How to train the network? 34:40 - Experimental Results 47:20 - Discussion & Outlook Paper: https://arxiv.org/abs/2112.03178 Abstract: Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning. Authors: Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, Zach Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

0:00 - Introduction 2:50 - What games can Player of Games be trained on? 4:00 - Tree search algorithms (AlphaZero) 8:00 - What is different in imperfect information games? 15:40 - Counterfactual Value- and Policy-Networks 18:50 - The Player of Games search procedure 28:30 - How to train the network? 34:40 - Experimental Results 47:20 - Discussion & Outlook
Comments
loading...