TU Darmstadt / ULB / TUbiblio

AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong

Blüml, Jannis ; Czech, Johannes ; Kersting, Kristian (2023)
AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong.
In: Frontiers in Artificial Intelligence, 2023, 6
doi: 10.26083/tuprints-00024064
Artikel, Zweitveröffentlichung, Verlagsversion

WarnungEs ist eine neuere Version dieses Eintrags verfügbar.

Kurzbeschreibung (Abstract)

In recent years, deep neural networks for strategy games have made significant progress. AlphaZero-like frameworks which combine Monte-Carlo tree search with reinforcement learning have been successfully applied to numerous games with perfect information. However, they have not been developed for domains where uncertainty and unknowns abound, and are therefore often considered unsuitable due to imperfect observations. Here, we challenge this view and argue that they are a viable alternative for games with imperfect information — a domain currently dominated by heuristic approaches or methods explicitly designed for hidden information, such as oracle-based techniques. To this end, we introduce a novel algorithm based solely on reinforcement learning, called AlphaZe∗∗, which is an AlphaZero-based framework for games with imperfect information. We examine its learning convergence on the games Stratego and DarkHex and show that it is a surprisingly strong baseline, while using a model-based approach: it achieves similar win rates against other Stratego bots like Pipeline Policy Space Response Oracle (P2SRO), while not winning in direct comparison against P2SRO or reaching the much stronger numbers of DeepNash. Compared to heuristics and oracle-based approaches, AlphaZe∗∗ can easily deal with rule changes, e.g., when more information than usual is given, and drastically outperforms other approaches in this respect.

Typ des Eintrags: Artikel
Erschienen: 2023
Autor(en): Blüml, Jannis ; Czech, Johannes ; Kersting, Kristian
Art des Eintrags: Zweitveröffentlichung
Titel: AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong
Sprache: Englisch
Publikationsjahr: 2023
Ort: Darmstadt
Publikationsdatum der Erstveröffentlichung: 2023
Verlag: Frontiers Media S.A.
Titel der Zeitschrift, Zeitung oder Schriftenreihe: Frontiers in Artificial Intelligence
Jahrgang/Volume einer Zeitschrift: 6
Kollation: 18 Seiten
DOI: 10.26083/tuprints-00024064
URL / URN: https://tuprints.ulb.tu-darmstadt.de/24064
Zugehörige Links:
Herkunft: Zweitveröffentlichung DeepGreen
Kurzbeschreibung (Abstract):

In recent years, deep neural networks for strategy games have made significant progress. AlphaZero-like frameworks which combine Monte-Carlo tree search with reinforcement learning have been successfully applied to numerous games with perfect information. However, they have not been developed for domains where uncertainty and unknowns abound, and are therefore often considered unsuitable due to imperfect observations. Here, we challenge this view and argue that they are a viable alternative for games with imperfect information — a domain currently dominated by heuristic approaches or methods explicitly designed for hidden information, such as oracle-based techniques. To this end, we introduce a novel algorithm based solely on reinforcement learning, called AlphaZe∗∗, which is an AlphaZero-based framework for games with imperfect information. We examine its learning convergence on the games Stratego and DarkHex and show that it is a surprisingly strong baseline, while using a model-based approach: it achieves similar win rates against other Stratego bots like Pipeline Policy Space Response Oracle (P2SRO), while not winning in direct comparison against P2SRO or reaching the much stronger numbers of DeepNash. Compared to heuristics and oracle-based approaches, AlphaZe∗∗ can easily deal with rule changes, e.g., when more information than usual is given, and drastically outperforms other approaches in this respect.

Freie Schlagworte: imperfect information games, deep neural networks, reinforcement learning, AlphaZero, Monte-Carlo tree search, perfect information Monte-Carlo
Status: Verlagsversion
URN: urn:nbn:de:tuda-tuprints-240643
Sachgruppe der Dewey Dezimalklassifikatin (DDC): 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Künstliche Intelligenz und Maschinelles Lernen
Zentrale Einrichtungen
Zentrale Einrichtungen > Centre for Cognitive Science (CCS)
Zentrale Einrichtungen > hessian.AI - Hessisches Zentrum für Künstliche Intelligenz
Hinterlegungsdatum: 26 Mai 2023 11:40
Letzte Änderung: 06 Jun 2023 09:09
PPN:
Export:
Suche nach Titel in: TUfind oder in Google

Verfügbare Versionen dieses Eintrags

Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen