Akrour, Riad ; Pajarinen, Joni ; Peters, Jan ; Neumann, Gerhard (2022):
Projections for Approximate Policy Iteration Algorithms. (Publisher's Version)
In: Proceedings of Machine Learning Research, 97, In: Proceedings of the 36th International Conference on Machine Learning, pp. 181-190,
Darmstadt, PMLR, 36th International Conference on Machine Learning, Long Beach, California, USA, 09.-15.06.2019, DOI: 10.26083/tuprints-00020582,
[Conference or Workshop Item]
Abstract
Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.
Item Type: | Conference or Workshop Item |
---|---|
Erschienen: | 2022 |
Creators: | Akrour, Riad ; Pajarinen, Joni ; Peters, Jan ; Neumann, Gerhard |
Origin: | Secondary publication service |
Status: | Publisher's Version |
Title: | Projections for Approximate Policy Iteration Algorithms |
Language: | English |
Abstract: | Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms. |
Book Title: | Proceedings of the 36th International Conference on Machine Learning |
Series: | Proceedings of Machine Learning Research |
Series Volume: | 97 |
Place of Publication: | Darmstadt |
Publisher: | PMLR |
Divisions: | 20 Department of Computer Science 20 Department of Computer Science > Intelligent Autonomous Systems |
TU-Projects: | EC/H2020|640554|SKILLS4ROBOTS |
Event Title: | 36th International Conference on Machine Learning |
Event Location: | Long Beach, California, USA |
Event Dates: | 09.-15.06.2019 |
Date Deposited: | 18 Nov 2022 14:34 |
DOI: | 10.26083/tuprints-00020582 |
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/20582 |
URN: | urn:nbn:de:tuda-tuprints-205824 |
PPN: | |
Corresponding Links: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
![]() |
Send an inquiry |
Options (only for editors)
![]() |
Show editorial Details |