TU Darmstadt / ULB / TUbiblio

Entropic Risk Measure in Policy Search

Nass, David ; Belousov, Boris ; Peters, Jan (2022):
Entropic Risk Measure in Policy Search. (Postprint)
In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1101-1106,
Darmstadt, IEEE, International Conference on Intelligent Robots and Systems (IROS), Macau, China, 03.-08.11.2019, e-ISSN 2153-0866, ISBN 978-1-7281-4004-9,
DOI: 10.26083/tuprints-00020551,
[Conference or Workshop Item]

Abstract

With the increasing pace of automation, modern robotic systems need to act in stochastic, non-stationary, partially observable environments. A range of algorithms for finding parameterized policies that optimize for long-term average performance have been proposed in the past. However, the majority of the proposed approaches does not explicitly take into account the variability of the performance metric, which may lead to finding policies that although performing well on average, can perform spectacularly bad in a particular run or over a period of time. To address this shortcoming, we study an approach to policy optimization that explicitly takes into account higher order statistics of the reward function. In this paper, we extend policy gradient methods to include the entropic risk measure in the objective function and evaluate their performance in simulation experiments and on a real-robot task of learning a hitting motion in robot badminton.

Item Type: Conference or Workshop Item
Erschienen: 2022
Creators: Nass, David ; Belousov, Boris ; Peters, Jan
Origin: Secondary publication service
Status: Postprint
Title: Entropic Risk Measure in Policy Search
Language: English
Abstract:

With the increasing pace of automation, modern robotic systems need to act in stochastic, non-stationary, partially observable environments. A range of algorithms for finding parameterized policies that optimize for long-term average performance have been proposed in the past. However, the majority of the proposed approaches does not explicitly take into account the variability of the performance metric, which may lead to finding policies that although performing well on average, can perform spectacularly bad in a particular run or over a period of time. To address this shortcoming, we study an approach to policy optimization that explicitly takes into account higher order statistics of the reward function. In this paper, we extend policy gradient methods to include the entropic risk measure in the objective function and evaluate their performance in simulation experiments and on a real-robot task of learning a hitting motion in robot badminton.

Book Title: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Place of Publication: Darmstadt
Publisher: IEEE
ISBN: 978-1-7281-4004-9
Collation: 6 Seiten
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Intelligent Autonomous Systems
TU-Projects: EC/H2020|640554|SKILLS4ROBOTS
Event Title: International Conference on Intelligent Robots and Systems (IROS)
Event Location: Macau, China
Event Dates: 03.-08.11.2019
Date Deposited: 22 Nov 2022 09:52
DOI: 10.26083/tuprints-00020551
URL / URN: https://tuprints.ulb.tu-darmstadt.de/20551
URN: urn:nbn:de:tuda-tuprints-205513
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details