Akrour, Riad ; Pajarinen, Joni ; Peters, Jan ; Neumann, Gerhard (2022)
Projections for Approximate Policy Iteration Algorithms.
36th International Conference on Machine Learning. Long Beach, California, USA (09.-15.06.2019)
Konferenzveröffentlichung, Bibliographie
Dies ist die neueste Version dieses Eintrags.
Kurzbeschreibung (Abstract)
Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2022 |
Autor(en): | Akrour, Riad ; Pajarinen, Joni ; Peters, Jan ; Neumann, Gerhard |
Art des Eintrags: | Bibliographie |
Titel: | Projections for Approximate Policy Iteration Algorithms |
Sprache: | Englisch |
Publikationsjahr: | 2022 |
Ort: | Darmstadt |
Verlag: | PMLR |
Buchtitel: | Proceedings of the 36th International Conference on Machine Learning |
Reihe: | Proceedings of Machine Learning Research |
Band einer Reihe: | 97 |
Veranstaltungstitel: | 36th International Conference on Machine Learning |
Veranstaltungsort: | Long Beach, California, USA |
Veranstaltungsdatum: | 09.-15.06.2019 |
Zugehörige Links: | |
Kurzbeschreibung (Abstract): | Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms. |
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Intelligente Autonome Systeme |
TU-Projekte: | EC/H2020|640554|SKILLS4ROBOTS |
Hinterlegungsdatum: | 02 Aug 2024 12:45 |
Letzte Änderung: | 02 Aug 2024 12:45 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Verfügbare Versionen dieses Eintrags
-
Projections for Approximate Policy Iteration Algorithms. (deposited 18 Nov 2022 14:34)
- Projections for Approximate Policy Iteration Algorithms. (deposited 02 Aug 2024 12:45) [Gegenwärtig angezeigt]
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |