Šošić, A. (2018)
Learning Models of Behavior From Demonstration and Through Interaction.
Technische Universität Darmstadt
Dissertation, Erstveröffentlichung
Kurzbeschreibung (Abstract)
This dissertation is concerned with the autonomous learning of behavioral models for sequential decision-making. It addresses both the theoretical aspects of behavioral modeling — like the learning of appropriate task representations — and the practical difficulties regarding algorithmic implementation.
The first half of the dissertation deals with the problem of learning from demonstration, which consists in generalizing the behavior of an expert demonstrator based on observation data. Two alternative modeling paradigms are discussed. First, a nonparametric inference framework is developed to capture the behavior of the expert at the policy level. A key challenge in the design of the framework is the objective of making minimal assumptions about the observed behavior type while dealing with a potentially infinite number of system states. Due to the automatic adaptation of the model order to the complexity of the shown behavior, the proposed approach is able to pick up stochastic expert policies of arbitrary structure. Second, a nonparametric inverse reinforcement learning framework based on subgoal modeling is proposed, which allows to efficiently reconstruct the expert behavior at the intentional level. Other than most existing approaches, the proposed methodology naturally handles periodic tasks and situations where the intentions of the expert change over time. By adaptively decomposing the decision-making problem into a series of task-related subproblems, both inference frameworks are suitable for learning compact encodings of the expert behavior. For performance evaluation, the models are compared with existing frameworks on synthetic benchmark scenarios and real-world data recorded on a KUKA lightweight robotic arm.
In the second half of the work, the focus shifts to multi-agent modeling, with the aim of analyzing the decision-making process in large-scale homogeneous agent networks. To fill the gap of decentralized system models with explicit agent homogeneity, a new class of agent systems is introduced. For this system class, the problem of inverse reinforcement learning is discussed and a meta learning algorithm is devised that makes explicit use of the system symmetries. As part of the algorithm, a heterogeneous reinforcement learning scheme is proposed for optimizing the collective behavior of the system based on the local state observations made at the agent level. Finally, to scale the simulation of the network to large agent numbers, a continuum version of the model is derived. After discussing the system components and associated optimality criteria, numerical examples of collective tasks are given that demonstrate the capabilities of the continuum approach and show its advantages over large-scale agent-based modeling.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2018 | ||||
Autor(en): | Šošić, A. | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Learning Models of Behavior From Demonstration and Through Interaction | ||||
Sprache: | Englisch | ||||
Referenten: | Zoubir, Prof. Dr. Abdelhak M. ; Koeppl, Prof. Dr. Heinz | ||||
Publikationsjahr: | 2018 | ||||
Ort: | Darmstadt | ||||
Datum der mündlichen Prüfung: | 21 August 2018 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/8107 | ||||
Kurzbeschreibung (Abstract): | This dissertation is concerned with the autonomous learning of behavioral models for sequential decision-making. It addresses both the theoretical aspects of behavioral modeling — like the learning of appropriate task representations — and the practical difficulties regarding algorithmic implementation. The first half of the dissertation deals with the problem of learning from demonstration, which consists in generalizing the behavior of an expert demonstrator based on observation data. Two alternative modeling paradigms are discussed. First, a nonparametric inference framework is developed to capture the behavior of the expert at the policy level. A key challenge in the design of the framework is the objective of making minimal assumptions about the observed behavior type while dealing with a potentially infinite number of system states. Due to the automatic adaptation of the model order to the complexity of the shown behavior, the proposed approach is able to pick up stochastic expert policies of arbitrary structure. Second, a nonparametric inverse reinforcement learning framework based on subgoal modeling is proposed, which allows to efficiently reconstruct the expert behavior at the intentional level. Other than most existing approaches, the proposed methodology naturally handles periodic tasks and situations where the intentions of the expert change over time. By adaptively decomposing the decision-making problem into a series of task-related subproblems, both inference frameworks are suitable for learning compact encodings of the expert behavior. For performance evaluation, the models are compared with existing frameworks on synthetic benchmark scenarios and real-world data recorded on a KUKA lightweight robotic arm. In the second half of the work, the focus shifts to multi-agent modeling, with the aim of analyzing the decision-making process in large-scale homogeneous agent networks. To fill the gap of decentralized system models with explicit agent homogeneity, a new class of agent systems is introduced. For this system class, the problem of inverse reinforcement learning is discussed and a meta learning algorithm is devised that makes explicit use of the system symmetries. As part of the algorithm, a heterogeneous reinforcement learning scheme is proposed for optimizing the collective behavior of the system based on the local state observations made at the agent level. Finally, to scale the simulation of the network to large agent numbers, a continuum version of the model is derived. After discussing the system components and associated optimality criteria, numerical examples of collective tasks are given that demonstrate the capabilities of the continuum approach and show its advantages over large-scale agent-based modeling. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-81079 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik 500 Naturwissenschaften und Mathematik > 510 Mathematik 600 Technik, Medizin, angewandte Wissenschaften > 620 Ingenieurwissenschaften und Maschinenbau |
||||
Fachbereich(e)/-gebiet(e): | 18 Fachbereich Elektrotechnik und Informationstechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Signalverarbeitung |
||||
Hinterlegungsdatum: | 21 Okt 2018 19:55 | ||||
Letzte Änderung: | 25 Okt 2018 09:00 | ||||
PPN: | |||||
Referenten: | Zoubir, Prof. Dr. Abdelhak M. ; Koeppl, Prof. Dr. Heinz | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 21 August 2018 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |