Zelch, Christoph (2024)
Iterative Synthesis of Extremal Fields for Near-Optimal Feedback Control of Robotic Systems.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00027577
Dissertation, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
Optimal control of robots, vehicles, or industrial plants is essential, as it can provide much better, e.g., faster or more energy efficient, operation of these systems than hand-crafted control policies. Optimal control theory and (numerical) methods allow the computation of control sequences for high-dimensional dynamic systems by mathematically defining high-level goals. It is based on mathematical nonlinear dynamics models of such systems, which are often available in high quality for robots and vehicles, typically based on first principles of physics (white-box approaches). However, if the computed sequence of optimal actions is applied to a real robot, the system’s states will eventually deviate from the precomputed trajectory due to inevitable model inaccuracies or unforeseen perturbations. This motivates the search for a nonlinear feedback controller that provides optimal control values not only on an optimal path but in real-time for arbitrary system states, which allows the controlled system to proceed optimally, even in case of disturbances. Explicit formulations of optimal feedback controllers only exist for certain systems, e.g., with linear dynamics and quadratic cost functions, but not for general robots with nonlinear system dynamics. In contrast to white-box approaches based on explicit mathematical models of system dynamics, machine learning approaches based on data-driven black-box models can learn optimal feedback control policies for more general optimal control problems with nonlinear systems. However, they crucially depend on the training scenarios to collect large amounts of data and cannot generalize well beyond these, while white-box approaches are often also useful in scenarios that have not been encountered before. The main motivation for this thesis is to investigate the combination of white-box optimal control approaches and black-box machine learning to benefit from the advantages of both concepts. The focus is on the extremal field approach, where a near-optimal feedback control policy is learned from a set of optimal reference trajectories, the extremal field. It uses the advantages of machine learning approaches and, at the same time, leverages the capabilities of available numerical optimal control solvers that allow the incorporation of knowledge about the problem structure and the consideration of nonlinear constraints. In this work, the reference trajectories are computed iteratively from carefully selected start states to use the information provided by previously computed trajectories and the current feedback control policy approximation. Because of the curse of dimensionality, it is challenging to cover high-dimensional joint spaces with sufficient training data, which makes it necessary to focus on small subspaces relevant to a specific task. To address the problem of simultaneously sufficient and efficient coverage of a relevant part of the joint space, three complementing start state selection strategies for the computation of the extremal field are developed. They utilize information from the optimal control solver, from already computed optimal trajectories and uncertainty information provided by the current approximation of the feedback policy. Further, a switch-over to a proportional-integral (PI) controller in the vicinity of a goal state is proposed to stabilize the system around this state without the need for large amounts of training data in this area. The interpolation between the optimal trajectories to fit the feedback control policy is an essential part of the extremal field approach. It imposes specific requirements on the approximation methods formulated in this work. Two ubiquitous function approximation methods, Gaussian processes and artificial neural networks, are compared and analyzed regarding their suitability for the approximation of optimal feedback control policies with respect to these requirements. The quality of the feedback control approximation in the extremal field approach can be degraded if data from multiple different solution clusters is merged since the approximation method may directly interpolate between different solutions and, thus, blur their structures. Current trajectory clustering approaches capable of addressing this problem are often learning-based or use pointwise Euclidean distances between two trajectories. A rule-based trajectory clustering approach is developed, which is based on the extraction of characteristic features from motion trajectories’ graphs to create a compressed trajectory representation. This representation can be used in an existing string kernel-based distance measure. The proposed methods are evaluated on different robot models with nonlinear dynamics in simulation (including a detailed nonlinear dynamics model of an industrial robot arm) and physical experiments (Furuta pendulum arm).
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2024 | ||||
Autor(en): | Zelch, Christoph | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Iterative Synthesis of Extremal Fields for Near-Optimal Feedback Control of Robotic Systems | ||||
Sprache: | Englisch | ||||
Referenten: | Stryk, Prof. Dr. Oskar von ; Conway, Prof. Ph.D Bruce A. | ||||
Publikationsjahr: | 23 Oktober 2024 | ||||
Ort: | Darmstadt | ||||
Kollation: | xix, 175 Seiten | ||||
Datum der mündlichen Prüfung: | 18 März 2024 | ||||
DOI: | 10.26083/tuprints-00027577 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/27577 | ||||
Kurzbeschreibung (Abstract): | Optimal control of robots, vehicles, or industrial plants is essential, as it can provide much better, e.g., faster or more energy efficient, operation of these systems than hand-crafted control policies. Optimal control theory and (numerical) methods allow the computation of control sequences for high-dimensional dynamic systems by mathematically defining high-level goals. It is based on mathematical nonlinear dynamics models of such systems, which are often available in high quality for robots and vehicles, typically based on first principles of physics (white-box approaches). However, if the computed sequence of optimal actions is applied to a real robot, the system’s states will eventually deviate from the precomputed trajectory due to inevitable model inaccuracies or unforeseen perturbations. This motivates the search for a nonlinear feedback controller that provides optimal control values not only on an optimal path but in real-time for arbitrary system states, which allows the controlled system to proceed optimally, even in case of disturbances. Explicit formulations of optimal feedback controllers only exist for certain systems, e.g., with linear dynamics and quadratic cost functions, but not for general robots with nonlinear system dynamics. In contrast to white-box approaches based on explicit mathematical models of system dynamics, machine learning approaches based on data-driven black-box models can learn optimal feedback control policies for more general optimal control problems with nonlinear systems. However, they crucially depend on the training scenarios to collect large amounts of data and cannot generalize well beyond these, while white-box approaches are often also useful in scenarios that have not been encountered before. The main motivation for this thesis is to investigate the combination of white-box optimal control approaches and black-box machine learning to benefit from the advantages of both concepts. The focus is on the extremal field approach, where a near-optimal feedback control policy is learned from a set of optimal reference trajectories, the extremal field. It uses the advantages of machine learning approaches and, at the same time, leverages the capabilities of available numerical optimal control solvers that allow the incorporation of knowledge about the problem structure and the consideration of nonlinear constraints. In this work, the reference trajectories are computed iteratively from carefully selected start states to use the information provided by previously computed trajectories and the current feedback control policy approximation. Because of the curse of dimensionality, it is challenging to cover high-dimensional joint spaces with sufficient training data, which makes it necessary to focus on small subspaces relevant to a specific task. To address the problem of simultaneously sufficient and efficient coverage of a relevant part of the joint space, three complementing start state selection strategies for the computation of the extremal field are developed. They utilize information from the optimal control solver, from already computed optimal trajectories and uncertainty information provided by the current approximation of the feedback policy. Further, a switch-over to a proportional-integral (PI) controller in the vicinity of a goal state is proposed to stabilize the system around this state without the need for large amounts of training data in this area. The interpolation between the optimal trajectories to fit the feedback control policy is an essential part of the extremal field approach. It imposes specific requirements on the approximation methods formulated in this work. Two ubiquitous function approximation methods, Gaussian processes and artificial neural networks, are compared and analyzed regarding their suitability for the approximation of optimal feedback control policies with respect to these requirements. The quality of the feedback control approximation in the extremal field approach can be degraded if data from multiple different solution clusters is merged since the approximation method may directly interpolate between different solutions and, thus, blur their structures. Current trajectory clustering approaches capable of addressing this problem are often learning-based or use pointwise Euclidean distances between two trajectories. A rule-based trajectory clustering approach is developed, which is based on the extraction of characteristic features from motion trajectories’ graphs to create a compressed trajectory representation. This representation can be used in an existing string kernel-based distance measure. The proposed methods are evaluated on different robot models with nonlinear dynamics in simulation (including a detailed nonlinear dynamics model of an industrial robot arm) and physical experiments (Furuta pendulum arm). |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Status: | Verlagsversion | ||||
URN: | urn:nbn:de:tuda-tuprints-275778 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Simulation, Systemoptimierung und Robotik |
||||
Hinterlegungsdatum: | 23 Okt 2024 12:05 | ||||
Letzte Änderung: | 25 Okt 2024 12:44 | ||||
PPN: | |||||
Referenten: | Stryk, Prof. Dr. Oskar von ; Conway, Prof. Ph.D Bruce A. | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 18 März 2024 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |