Becker-Ehmck, Philip (2022)
Latent State-Space Models for Control.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00022489
Dissertation, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
Learning to control robots without human supervision and prolonged engineering effort has been a long-term dream in the intersection of machine learning and robotics. If successful, it would enable many novel applications from soft robotics over human-robot interaction to quick adaptation to unseen tasks or robotic setups. A key driving force behind this dream are inherit limitations of classical control algorithms that restrict applicability to low-dimensional and engineered state-spaces, prohibiting the use of high-dimensional sensors such cameras or touchpads. As an alternative to classical control methods, reinforcement learning presumes no prior knowledge of a robot's dynamics and paired with deep learning opens the door to use high-dimensional sensory information of any kind. Yet, reinforcement learning has only achieved limited impact on real-time robot control due to its high demand for real-world interactions (among other reasons). Model-based approaches promise to be much more data efficient, but present the challenge of engineering accurate simulators. As building a simulator comes with many of the same challenges as designing a controller, using engineered simulators is not a satisfactory solution for the generic goal of learning to control; most of the engineering work would still have to be done to build the simulator. Instead, learning such a model, in particular a latent state-space model (LSSM), promises to resolve us from engineering a simulator while still reaping the benefits of having one. A learned latent space can compactly represent high-dimensional sensor information and store all relevant information for prediction and control. In this thesis, we show how to perform system identification of complex and nonlinear systems based on high-dimensional observations purely from raw sensory data. Despite their complexity, such systems can often be approximated well by a set of linear dynamical systems if broken into appropriate subsequences. This mechanism not only helps us find good approximations of dynamics, but also gives us deeper insight into the underlying system. Combining Bayesian inference, Variational Autoencoders and Concrete relaxations, we show how to learn a richer and more meaningful state-space, for example by encoding joint constraints or collisions with walls in a maze, from partial and high-dimensional observations. In a setting with time-varying dynamics, we show how our inference method for continuous switching variables can infer changing but unobserved physical properties that govern the dynamics of a system, such as masses or link lengths in robotic simulations. This inference happens online in our learned filter without retraining or fine-tuning of model parameters. Quantitatively, we find that such representations translate into a gain of accuracy of learned dynamics showcased on various simulated tasks and that they promise to be helpful for policy optimization. Building on this work, we show how this LSSM can be used to learn a probabilistic model of real-world robot dynamics, such as from a self-built drone and a 7 degrees of freedom robot arm. No prior knowledge of the flight dynamics or kinematics is assumed. On top, we propose a novel model-based reinforcement learning method where both a parameterized policy and value function are optimized entirely by propagating stochastic analytic gradients through generated latent trajectories. Our learned thrust-attitude controller can fly a drone to a randomly placed marker in an enclosed environment, and steer a joint velocity controlled robot arm to random end effector positions in Cartesian space. This can be achieved with less than an hour of interactions on the real system. The control policy is learned entirely in the learned simulator and can be applied without modification or fine-tuning to the real system. Last, we propose a novel exploration criterion for the development of autonomous agents: Empowerment Gain. Different to other exploration criteria, this approach ties together an agent's entire perception-control loop and its current capabilities to act. Perspectively, this method will help us learn models of the world that are actually relevant to realizing an agent's influence in the world. As a key insight, our learned models do not actually have to be perfect simulators of the entire world and all of its processes, rather they need to convey the information necessary to enable an agent to interact with the world around him. We show how this criterion compares to, and in some ways incorporates, other intrinsic motivations such as novelty seeking, surprise minimization and learning progress. While our method still ensures exploration of the entire space, it prefers regions with greater potential for realizing an agent's influence in the world. In conclusion, we give answers to three major questions: (1) how do we learn a LSSM from raw sensory data, (2) how do we use it for control and (3) what parts of the world do we need to explore and model in the first place. While the last part remains in a theoretical and conceptual stage, we demonstrate the first two on two different real-world robotic platforms. We focused on proposing general purpose methods that are as broadly applicable as they can be, but are still successful in a real-world setting.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2022 | ||||
Autor(en): | Becker-Ehmck, Philip | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Latent State-Space Models for Control | ||||
Sprache: | Englisch | ||||
Referenten: | Peters, Prof. Dr. Jan ; Hutter, Prof. Dr. Marco | ||||
Publikationsjahr: | 2022 | ||||
Ort: | Darmstadt | ||||
Kollation: | xiii, 129 Seiten | ||||
Datum der mündlichen Prüfung: | 26 September 2022 | ||||
DOI: | 10.26083/tuprints-00022489 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/22489 | ||||
Kurzbeschreibung (Abstract): | Learning to control robots without human supervision and prolonged engineering effort has been a long-term dream in the intersection of machine learning and robotics. If successful, it would enable many novel applications from soft robotics over human-robot interaction to quick adaptation to unseen tasks or robotic setups. A key driving force behind this dream are inherit limitations of classical control algorithms that restrict applicability to low-dimensional and engineered state-spaces, prohibiting the use of high-dimensional sensors such cameras or touchpads. As an alternative to classical control methods, reinforcement learning presumes no prior knowledge of a robot's dynamics and paired with deep learning opens the door to use high-dimensional sensory information of any kind. Yet, reinforcement learning has only achieved limited impact on real-time robot control due to its high demand for real-world interactions (among other reasons). Model-based approaches promise to be much more data efficient, but present the challenge of engineering accurate simulators. As building a simulator comes with many of the same challenges as designing a controller, using engineered simulators is not a satisfactory solution for the generic goal of learning to control; most of the engineering work would still have to be done to build the simulator. Instead, learning such a model, in particular a latent state-space model (LSSM), promises to resolve us from engineering a simulator while still reaping the benefits of having one. A learned latent space can compactly represent high-dimensional sensor information and store all relevant information for prediction and control. In this thesis, we show how to perform system identification of complex and nonlinear systems based on high-dimensional observations purely from raw sensory data. Despite their complexity, such systems can often be approximated well by a set of linear dynamical systems if broken into appropriate subsequences. This mechanism not only helps us find good approximations of dynamics, but also gives us deeper insight into the underlying system. Combining Bayesian inference, Variational Autoencoders and Concrete relaxations, we show how to learn a richer and more meaningful state-space, for example by encoding joint constraints or collisions with walls in a maze, from partial and high-dimensional observations. In a setting with time-varying dynamics, we show how our inference method for continuous switching variables can infer changing but unobserved physical properties that govern the dynamics of a system, such as masses or link lengths in robotic simulations. This inference happens online in our learned filter without retraining or fine-tuning of model parameters. Quantitatively, we find that such representations translate into a gain of accuracy of learned dynamics showcased on various simulated tasks and that they promise to be helpful for policy optimization. Building on this work, we show how this LSSM can be used to learn a probabilistic model of real-world robot dynamics, such as from a self-built drone and a 7 degrees of freedom robot arm. No prior knowledge of the flight dynamics or kinematics is assumed. On top, we propose a novel model-based reinforcement learning method where both a parameterized policy and value function are optimized entirely by propagating stochastic analytic gradients through generated latent trajectories. Our learned thrust-attitude controller can fly a drone to a randomly placed marker in an enclosed environment, and steer a joint velocity controlled robot arm to random end effector positions in Cartesian space. This can be achieved with less than an hour of interactions on the real system. The control policy is learned entirely in the learned simulator and can be applied without modification or fine-tuning to the real system. Last, we propose a novel exploration criterion for the development of autonomous agents: Empowerment Gain. Different to other exploration criteria, this approach ties together an agent's entire perception-control loop and its current capabilities to act. Perspectively, this method will help us learn models of the world that are actually relevant to realizing an agent's influence in the world. As a key insight, our learned models do not actually have to be perfect simulators of the entire world and all of its processes, rather they need to convey the information necessary to enable an agent to interact with the world around him. We show how this criterion compares to, and in some ways incorporates, other intrinsic motivations such as novelty seeking, surprise minimization and learning progress. While our method still ensures exploration of the entire space, it prefers regions with greater potential for realizing an agent's influence in the world. In conclusion, we give answers to three major questions: (1) how do we learn a LSSM from raw sensory data, (2) how do we use it for control and (3) what parts of the world do we need to explore and model in the first place. While the last part remains in a theoretical and conceptual stage, we demonstrate the first two on two different real-world robotic platforms. We focused on proposing general purpose methods that are as broadly applicable as they can be, but are still successful in a real-world setting. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Status: | Verlagsversion | ||||
URN: | urn:nbn:de:tuda-tuprints-224895 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik 600 Technik, Medizin, angewandte Wissenschaften > 600 Technik 600 Technik, Medizin, angewandte Wissenschaften > 620 Ingenieurwissenschaften und Maschinenbau |
||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Intelligente Autonome Systeme |
||||
Hinterlegungsdatum: | 25 Nov 2022 12:34 | ||||
Letzte Änderung: | 28 Nov 2022 09:06 | ||||
PPN: | |||||
Referenten: | Peters, Prof. Dr. Jan ; Hutter, Prof. Dr. Marco | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 26 September 2022 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |