Projections for Approximate Policy Iteration Algorithms

Akrour, Riad ; Pajarinen, Joni ; Peters, Jan ; Neumann, Gerhard (2022)
Projections for Approximate Policy Iteration Algorithms.
36th International Conference on Machine Learning. Long Beach, California, USA (09.-15.06.2019)
doi: 10.26083/tuprints-00020582
Konferenzveröffentlichung, Zweitveröffentlichung, Verlagsversion

URL / URN: https://tuprints.ulb.tu-darmstadt.de/20582

Kurzbeschreibung (Abstract)

Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.

Typ des Eintrags:	Konferenzveröffentlichung
Erschienen:	2022
Autor(en):	Akrour, Riad ; Pajarinen, Joni ; Peters, Jan ; Neumann, Gerhard
Art des Eintrags:	Zweitveröffentlichung
Titel:	Projections for Approximate Policy Iteration Algorithms
Sprache:	Englisch
Publikationsjahr:	2022
Ort:	Darmstadt
Verlag:	PMLR
Buchtitel:	Proceedings of the 36th International Conference on Machine Learning
Reihe:	Proceedings of Machine Learning Research
Band einer Reihe:	97
Veranstaltungstitel:	36th International Conference on Machine Learning
Veranstaltungsort:	Long Beach, California, USA
Veranstaltungsdatum:	09.-15.06.2019
DOI:	10.26083/tuprints-00020582
URL / URN:	https://tuprints.ulb.tu-darmstadt.de/20582
Zugehörige Links:	Identisches Werk Forschungsdaten
Herkunft:	Zweitveröffentlichungsservice
Kurzbeschreibung (Abstract):	Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.
Status:	Verlagsversion
URN:	urn:nbn:de:tuda-tuprints-205824
Sachgruppe der Dewey Dezimalklassifikatin (DDC):	000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fachbereich(e)/-gebiet(e):	20 Fachbereich Informatik 20 Fachbereich Informatik > Intelligente Autonome Systeme
TU-Projekte:	EC/H2020\|640554\|SKILLS4ROBOTS
Hinterlegungsdatum:	18 Nov 2022 14:34
Letzte Änderung:	11 Mai 2023 05:42
PPN:	502453931
Zugehörige Links:	Identisches Werk Forschungsdaten
Export:

Suche nach Titel in:	TUfind oder in Google

Frage zum Eintrag

Optionen (nur für Redakteure)

Redaktionelle Details anzeigen

OAI 2.0-Basis-URL: https://tubiblio.ulb.tu-darmstadt.de/cgi/oai2 TUbiblio verwendet EPrints 3.

Drucken |

Impressum |

Datenschutzerklärung