TU Darmstadt / ULB / TUbiblio

A Bayesian Approach to Policy Recognition and State Representation Learning

Šošić, A. ; Zoubir, A. M. ; Koeppl, H. (2018)
A Bayesian Approach to Policy Recognition and State Representation Learning.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 40 (6)
doi: 10.1109/TPAMI.2017.2711024
Article

Abstract

Learning from demonstration (LfD) is the process of building behavioral models of a task from demonstrations provided by an expert. These models can be used, e.g., for system control by generalizing the expert demonstrations to previously unencountered situations. Most LfD methods, however, make strong assumptions about the expert behavior, e.g., they assume the existence of a deterministic optimal ground truth policy or require direct monitoring of the expert's controls, which limits their practical use as part of a general system identification framework. In this work, we consider the LfD problem in a more general setting where we allow for arbitrary stochastic expert policies, without reasoning about the optimality of the demonstrations. Following a Bayesian methodology, we model the full posterior distribution of possible expert controllers that explain the provided demonstration data. Moreover, we show that our methodology can be applied in a nonparametric context to infer the complexity of the state representation used by the expert, and to learn task-appropriate partitionings of the system state space.

Item Type: Article
Erschienen: 2018
Creators: Šošić, A. ; Zoubir, A. M. ; Koeppl, H.
Type of entry: Bibliographie
Title: A Bayesian Approach to Policy Recognition and State Representation Learning
Language: English
Date: 1 June 2018
Journal or Publication Title: IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume of the journal: 40
Issue Number: 6
DOI: 10.1109/TPAMI.2017.2711024
URL / URN: https://ieeexplore.ieee.org/document/7937852
Abstract:

Learning from demonstration (LfD) is the process of building behavioral models of a task from demonstrations provided by an expert. These models can be used, e.g., for system control by generalizing the expert demonstrations to previously unencountered situations. Most LfD methods, however, make strong assumptions about the expert behavior, e.g., they assume the existence of a deterministic optimal ground truth policy or require direct monitoring of the expert's controls, which limits their practical use as part of a general system identification framework. In this work, we consider the LfD problem in a more general setting where we allow for arbitrary stochastic expert policies, without reasoning about the optimality of the demonstrations. Following a Bayesian methodology, we model the full posterior distribution of possible expert controllers that explain the provided demonstration data. Moreover, we show that our methodology can be applied in a nonparametric context to infer the complexity of the state representation used by the expert, and to learn task-appropriate partitionings of the system state space.

Divisions: 18 Department of Electrical Engineering and Information Technology
18 Department of Electrical Engineering and Information Technology > Institute for Telecommunications > Bioinspired Communication Systems
18 Department of Electrical Engineering and Information Technology > Institute for Telecommunications
18 Department of Electrical Engineering and Information Technology > Institute for Telecommunications > Signal Processing
DFG-Collaborative Research Centres (incl. Transregio)
DFG-Collaborative Research Centres (incl. Transregio) > Collaborative Research Centres
Zentrale Einrichtungen
Zentrale Einrichtungen > Centre for Cognitive Science (CCS)
DFG-Collaborative Research Centres (incl. Transregio) > Collaborative Research Centres > CRC 1053: MAKI – Multi-Mechanisms Adaptation for the Future Internet
DFG-Collaborative Research Centres (incl. Transregio) > Collaborative Research Centres > CRC 1053: MAKI – Multi-Mechanisms Adaptation for the Future Internet > C: Communication Mechanisms
DFG-Collaborative Research Centres (incl. Transregio) > Collaborative Research Centres > CRC 1053: MAKI – Multi-Mechanisms Adaptation for the Future Internet > C: Communication Mechanisms > Subproject C3: Content-centred perspective
Date Deposited: 03 May 2016 16:59
Last Modified: 15 Dec 2022 09:22
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details