TU Darmstadt / ULB / TUbiblio

Projections for Approximate Policy Iteration Algorithms

Akrour, Riad ; Pajarinen, Joni ; Peters, Jan ; Neumann, Gerhard (2022)
Projections for Approximate Policy Iteration Algorithms.
36th International Conference on Machine Learning. Long Beach, California, USA (09.-15.06.2019)
doi: 10.26083/tuprints-00020582
Conference or Workshop Item, Secondary publication, Publisher's Version

Abstract

Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.

Item Type: Conference or Workshop Item
Erschienen: 2022
Creators: Akrour, Riad ; Pajarinen, Joni ; Peters, Jan ; Neumann, Gerhard
Type of entry: Secondary publication
Title: Projections for Approximate Policy Iteration Algorithms
Language: English
Date: 2022
Place of Publication: Darmstadt
Publisher: PMLR
Book Title: Proceedings of the 36th International Conference on Machine Learning
Series: Proceedings of Machine Learning Research
Series Volume: 97
Event Title: 36th International Conference on Machine Learning
Event Location: Long Beach, California, USA
Event Dates: 09.-15.06.2019
DOI: 10.26083/tuprints-00020582
URL / URN: https://tuprints.ulb.tu-darmstadt.de/20582
Corresponding Links:
Origin: Secondary publication service
Abstract:

Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.

Status: Publisher's Version
URN: urn:nbn:de:tuda-tuprints-205824
Classification DDC: 000 Generalities, computers, information > 004 Computer science
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Intelligent Autonomous Systems
TU-Projects: EC/H2020|640554|SKILLS4ROBOTS
Date Deposited: 18 Nov 2022 14:34
Last Modified: 11 May 2023 05:42
PPN: 502453931
Corresponding Links:
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details