TU Darmstadt / ULB / TUbiblio

Preference-based Reinforcement Learning: A Formal Framework and a Policy Iteration Algorithm

Fürnkranz, Johannes ; Hüllermeier, Eyke ; Cheng, Weiwei ; Park, Sang-Hyeun (2012)
Preference-based Reinforcement Learning: A Formal Framework and a Policy Iteration Algorithm.
In: Machine Learning, 89 (1-2)
doi: 10.1007/s10994-012-5313-8
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs of conventional RL algorithms. Instead, we propose an alternative framework for reinforcement learning, in which qualitative reward signals can be directly used by the learner. The framework may be viewed as a generalization of the conventional RL framework in which only a partial order between policies is required instead of the total order induced by their respective expected long-term reward. Therefore, building on novel methods for preference learning, our general goal is to equip the RL agent with qualitative policy models, such as ranking functions that allow for sorting its available actions from most to least promising, as well as algorithms for learning such models from qualitative feedback. As a proof of concept, we realize a first simple instantiation of this framework that defines preferences based on utilities observed for trajectories. To that end, we build on an existing method for approximate policy iteration based on roll-outs. While this approach is based on the use of classification methods for generalization and policy learning, we make use of a specific type of preference learning method called label ranking. Advantages of preference-based approximate policy iteration are illustrated by means of two case studies.

Typ des Eintrags: Artikel
Erschienen: 2012
Autor(en): Fürnkranz, Johannes ; Hüllermeier, Eyke ; Cheng, Weiwei ; Park, Sang-Hyeun
Art des Eintrags: Bibliographie
Titel: Preference-based Reinforcement Learning: A Formal Framework and a Policy Iteration Algorithm
Sprache: Englisch
Publikationsjahr: 2012
Titel der Zeitschrift, Zeitung oder Schriftenreihe: Machine Learning
Jahrgang/Volume einer Zeitschrift: 89
(Heft-)Nummer: 1-2
DOI: 10.1007/s10994-012-5313-8
Kurzbeschreibung (Abstract):

This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs of conventional RL algorithms. Instead, we propose an alternative framework for reinforcement learning, in which qualitative reward signals can be directly used by the learner. The framework may be viewed as a generalization of the conventional RL framework in which only a partial order between policies is required instead of the total order induced by their respective expected long-term reward. Therefore, building on novel methods for preference learning, our general goal is to equip the RL agent with qualitative policy models, such as ranking functions that allow for sorting its available actions from most to least promising, as well as algorithms for learning such models from qualitative feedback. As a proof of concept, we realize a first simple instantiation of this framework that defines preferences based on utilities observed for trajectories. To that end, we build on an existing method for approximate policy iteration based on roll-outs. While this approach is based on the use of classification methods for generalization and policy learning, we make use of a specific type of preference learning method called label ranking. Advantages of preference-based approximate policy iteration are illustrated by means of two case studies.

Freie Schlagworte: Reinforcement learning,Preference learning
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Knowledge Engineering
Hinterlegungsdatum: 26 Nov 2015 08:02
Letzte Änderung: 26 Nov 2015 08:02
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen