Urain, Julen (2024)
Deep Generative Models for Motion Planning and Control.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00027565
Dissertation, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
This thesis investigates the problem of robot motion generation, focusing primarily on data-driven motion generation. Traditionally, robot motion has been dictated by manually designed models. While effective for structured industrial environments where variability is minimal (e.g. palletizing robots handling similarly shaped boxes), these models fall short in more complex environments such as domestic spaces filled with diverse objects such as cutlery, bottles, and mugs.
Data-driven motion generation, often referred to as Learning from Demonstrations or Imitation Learning, is emerging as a promising solution for these complex environments. Using human expert demonstrations, the goal is to teach robots desired behaviors through the demonstrations. The power of this approach is evident in its transformative impact on fields such as Computer Vision and Natural Language Processing, where deep generative models have successfully produced images and text. In Robotics, however, current data-driven models still struggle to achieve the same breadth of generalization and robustness.
In this context, this thesis is inspired by the successful cases of both image and text generation. A critical insight underpins our investigation: An important factor of generalization in both image and text generation lies in the architectures chosen. While image generative models use architectures such as Convolutional Neural Networks to capture local geometric features in images, text generative models rely on structures such as Transformers to infer temporal features. With this in mind, the work presented in this thesis explores the following question: What are the architectural elements that we should integrate into our generative models in order to correctly generate robot movements?
In this direction, in this thesis, we propose three different works that explore the integration of different robotics-relevant properties (stability, geometry, and composability) in deep generative models. (1) With ImitationFlows, we study the problem of integrating global stability into motion policies. We propose a novel architecture that exploits the expressiveness of Normalizing Flows with the guarantee of learning globally stable behaviors. We show that these models can be used to represent stable motion behaviors in the robot's end-effector space~(6D position and orientation); a useful space for many robotic tasks. (2) With Composable Energy Policies, we study the problem of combining multiple motion policies to solve multi-objective problems. We explore the connections between multi-objective motion generation and Energy-Based Models and propose a novel model for combining energy-based policies represented in arbitrary spaces. (3) With SE(3)-DiffusionFields, we explore the problem of learning useful cost functions for Motion Planning. We propose to adapt Diffusion Models to the Lie group SE(3), which allows us to design Diffusion Models in the robot's end-effector space. We show that we can use these models to represent grasp pose distributions and use them as cost functions in Motion Planning problems. Each of the proposed methods has been evaluated in both simulated and real-world experiments to show their performance on real-world robotics problems, and we have open-sourced the codebase of the methods to encourage the community to build on our proposed solutions.
Overall, this thesis explores novel ways to apply deep generative models to robotics problems. We show the benefit of integrating robotics-relevant features such as geometry and composability with deep generative models, thereby benefiting from the expressiveness of deep generative models while improving generalization thanks to properly chosen inductive biases.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2024 | ||||
Autor(en): | Urain, Julen | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Deep Generative Models for Motion Planning and Control | ||||
Sprache: | Englisch | ||||
Referenten: | Peters, Prof. Jan ; Fragkiadaki, Prof. Katerina | ||||
Publikationsjahr: | 16 Juli 2024 | ||||
Ort: | Darmstadt | ||||
Kollation: | xv, 173 Seiten | ||||
Datum der mündlichen Prüfung: | 18 Dezember 2023 | ||||
DOI: | 10.26083/tuprints-00027565 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/27565 | ||||
Kurzbeschreibung (Abstract): | This thesis investigates the problem of robot motion generation, focusing primarily on data-driven motion generation. Traditionally, robot motion has been dictated by manually designed models. While effective for structured industrial environments where variability is minimal (e.g. palletizing robots handling similarly shaped boxes), these models fall short in more complex environments such as domestic spaces filled with diverse objects such as cutlery, bottles, and mugs. Data-driven motion generation, often referred to as Learning from Demonstrations or Imitation Learning, is emerging as a promising solution for these complex environments. Using human expert demonstrations, the goal is to teach robots desired behaviors through the demonstrations. The power of this approach is evident in its transformative impact on fields such as Computer Vision and Natural Language Processing, where deep generative models have successfully produced images and text. In Robotics, however, current data-driven models still struggle to achieve the same breadth of generalization and robustness. In this context, this thesis is inspired by the successful cases of both image and text generation. A critical insight underpins our investigation: An important factor of generalization in both image and text generation lies in the architectures chosen. While image generative models use architectures such as Convolutional Neural Networks to capture local geometric features in images, text generative models rely on structures such as Transformers to infer temporal features. With this in mind, the work presented in this thesis explores the following question: What are the architectural elements that we should integrate into our generative models in order to correctly generate robot movements? In this direction, in this thesis, we propose three different works that explore the integration of different robotics-relevant properties (stability, geometry, and composability) in deep generative models. (1) With ImitationFlows, we study the problem of integrating global stability into motion policies. We propose a novel architecture that exploits the expressiveness of Normalizing Flows with the guarantee of learning globally stable behaviors. We show that these models can be used to represent stable motion behaviors in the robot's end-effector space~(6D position and orientation); a useful space for many robotic tasks. (2) With Composable Energy Policies, we study the problem of combining multiple motion policies to solve multi-objective problems. We explore the connections between multi-objective motion generation and Energy-Based Models and propose a novel model for combining energy-based policies represented in arbitrary spaces. (3) With SE(3)-DiffusionFields, we explore the problem of learning useful cost functions for Motion Planning. We propose to adapt Diffusion Models to the Lie group SE(3), which allows us to design Diffusion Models in the robot's end-effector space. We show that we can use these models to represent grasp pose distributions and use them as cost functions in Motion Planning problems. Each of the proposed methods has been evaluated in both simulated and real-world experiments to show their performance on real-world robotics problems, and we have open-sourced the codebase of the methods to encourage the community to build on our proposed solutions. Overall, this thesis explores novel ways to apply deep generative models to robotics problems. We show the benefit of integrating robotics-relevant features such as geometry and composability with deep generative models, thereby benefiting from the expressiveness of deep generative models while improving generalization thanks to properly chosen inductive biases. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Status: | Verlagsversion | ||||
URN: | urn:nbn:de:tuda-tuprints-275657 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik 600 Technik, Medizin, angewandte Wissenschaften > 620 Ingenieurwissenschaften und Maschinenbau |
||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Intelligente Autonome Systeme |
||||
Hinterlegungsdatum: | 16 Jul 2024 08:38 | ||||
Letzte Änderung: | 17 Jul 2024 12:20 | ||||
PPN: | |||||
Referenten: | Peters, Prof. Jan ; Fragkiadaki, Prof. Katerina | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 18 Dezember 2023 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |