TU Darmstadt / ULB / TUbiblio

Entropic Risk Measure in Policy Search

Nass, David ; Belousov, Boris ; Peters, Jan (2022)
Entropic Risk Measure in Policy Search.
International Conference on Intelligent Robots and Systems (IROS). Macau, China (03.-08.11.2019)
Konferenzveröffentlichung, Bibliographie

Dies ist die neueste Version dieses Eintrags.

Kurzbeschreibung (Abstract)

With the increasing pace of automation, modern robotic systems need to act in stochastic, non-stationary, partially observable environments. A range of algorithms for finding parameterized policies that optimize for long-term average performance have been proposed in the past. However, the majority of the proposed approaches does not explicitly take into account the variability of the performance metric, which may lead to finding policies that although performing well on average, can perform spectacularly bad in a particular run or over a period of time. To address this shortcoming, we study an approach to policy optimization that explicitly takes into account higher order statistics of the reward function. In this paper, we extend policy gradient methods to include the entropic risk measure in the objective function and evaluate their performance in simulation experiments and on a real-robot task of learning a hitting motion in robot badminton.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2022
Autor(en): Nass, David ; Belousov, Boris ; Peters, Jan
Art des Eintrags: Bibliographie
Titel: Entropic Risk Measure in Policy Search
Sprache: Englisch
Publikationsjahr: 2022
Ort: Darmstadt
Verlag: IEEE
Buchtitel: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Kollation: 6 Seiten
Veranstaltungstitel: International Conference on Intelligent Robots and Systems (IROS)
Veranstaltungsort: Macau, China
Veranstaltungsdatum: 03.-08.11.2019
Zugehörige Links:
Kurzbeschreibung (Abstract):

With the increasing pace of automation, modern robotic systems need to act in stochastic, non-stationary, partially observable environments. A range of algorithms for finding parameterized policies that optimize for long-term average performance have been proposed in the past. However, the majority of the proposed approaches does not explicitly take into account the variability of the performance metric, which may lead to finding policies that although performing well on average, can perform spectacularly bad in a particular run or over a period of time. To address this shortcoming, we study an approach to policy optimization that explicitly takes into account higher order statistics of the reward function. In this paper, we extend policy gradient methods to include the entropic risk measure in the objective function and evaluate their performance in simulation experiments and on a real-robot task of learning a hitting motion in robot badminton.

Sachgruppe der Dewey Dezimalklassifikatin (DDC): 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Intelligente Autonome Systeme
TU-Projekte: EC/H2020|640554|SKILLS4ROBOTS
Hinterlegungsdatum: 02 Aug 2024 12:45
Letzte Änderung: 02 Aug 2024 12:45
PPN:
Export:
Suche nach Titel in: TUfind oder in Google

Verfügbare Versionen dieses Eintrags

Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen