Joppen, Tobias (2022)
An Ordinal Agent Framework.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00019749
Dissertation, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
In this thesis, we introduce algorithms to solve ordinal multi-armed bandit problems, Monte-Carlo tree search, and reinforcement learning problems. With ordinal problems, an agent does not receive numerical rewards, but ordinal rewards that cope without any distance measure. For humans, it is often hard to define or to determine exact numerical feedback signals but simpler to come up with an ordering over possibilities. For instance, when looking at medical treatment, the ordering patient death < patient ill < patient cured is easy to come up with but it is hard to assign numerical values to them. As most state-of-the-art algorithms rely on numerical operations, they can not be applied in the presence of ordinal rewards. We present a preference-based approach leveraging dueling bandits to sequential decision problems and discuss its disadvantages in terms of sample efficiency and scalability. Following another idea, our final approach to identify optimal arms is based on the comparison of reward distributions using the Borda method. We test this approach on multi-armed bandits, leverage it to Monte-Carlo tree search, and also apply it to reinforcement learning. To do so, we introduce a framework that encapsulates the similarities of the different problem definitions. We test our ordinal algorithms on frameworks like the General Video Game Framework (GVGAI), OpenAI, or synthetic data and compare it to ordinal, numerical, or domain-specific algorithms. Since our algorithms are time-dependent on the number of perceived ordinal rewards, we introduce a binning method that artificially reduces the number of rewards.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2022 | ||||
Autor(en): | Joppen, Tobias | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | An Ordinal Agent Framework | ||||
Sprache: | Englisch | ||||
Referenten: | Kersting, Prof. Dr. Kristian ; Fürnkranz, Prof. Dr. Johannes | ||||
Publikationsjahr: | 2022 | ||||
Ort: | Darmstadt | ||||
Kollation: | xii, 105 Seiten | ||||
Datum der mündlichen Prüfung: | 19 März 2021 | ||||
DOI: | 10.26083/tuprints-00019749 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/19749 | ||||
Kurzbeschreibung (Abstract): | In this thesis, we introduce algorithms to solve ordinal multi-armed bandit problems, Monte-Carlo tree search, and reinforcement learning problems. With ordinal problems, an agent does not receive numerical rewards, but ordinal rewards that cope without any distance measure. For humans, it is often hard to define or to determine exact numerical feedback signals but simpler to come up with an ordering over possibilities. For instance, when looking at medical treatment, the ordering patient death < patient ill < patient cured is easy to come up with but it is hard to assign numerical values to them. As most state-of-the-art algorithms rely on numerical operations, they can not be applied in the presence of ordinal rewards. We present a preference-based approach leveraging dueling bandits to sequential decision problems and discuss its disadvantages in terms of sample efficiency and scalability. Following another idea, our final approach to identify optimal arms is based on the comparison of reward distributions using the Borda method. We test this approach on multi-armed bandits, leverage it to Monte-Carlo tree search, and also apply it to reinforcement learning. To do so, we introduce a framework that encapsulates the similarities of the different problem definitions. We test our ordinal algorithms on frameworks like the General Video Game Framework (GVGAI), OpenAI, or synthetic data and compare it to ordinal, numerical, or domain-specific algorithms. Since our algorithms are time-dependent on the number of perceived ordinal rewards, we introduce a binning method that artificially reduces the number of rewards. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Status: | Verlagsversion | ||||
URN: | urn:nbn:de:tuda-tuprints-197490 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Künstliche Intelligenz und Maschinelles Lernen |
||||
Hinterlegungsdatum: | 03 Mär 2022 13:25 | ||||
Letzte Änderung: | 04 Mär 2022 11:45 | ||||
PPN: | |||||
Referenten: | Kersting, Prof. Dr. Kristian ; Fürnkranz, Prof. Dr. Johannes | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 19 März 2021 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |