Stelzner, Karl (2023)
Elements of Unsupervised Scene Understanding: Objectives, Structures, and Modalities.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00026355
Dissertation, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
Enabling robust interactions between automated systems and the real world is a major goal of artificial intelligence. A key ingredient towards this goal is scene understanding: the ability to process visual imagery into a concise representation of the depicted scene, including the identity, position, and geometry of objects. While supervised deep learning approaches have proven effective at processing visual inputs, the cost of supplying human annotations for training quickly becomes infeasible as the diversity of the inputs and the required level of detail increases, putting full real-world scene understanding out of reach.
For this reason, this thesis investigates unsupervised methods to scene understanding. In particular, we utilize generative models with structured latent variables to facilitate the learning of object-based representations. We start our investigation in an autoencoding setting, where we highlight the capability of such systems to identify objects without human supervision, as well as the advantages of integrating tractable components within them. At the same time, we identify some limitations of this setting, which prevent success in more visually complex environments. Based on this, we then turn to video data, where we leverage the prediction of dynamics to both regularize the representation learning task and to enable applications to reinforcement learning. Finally, to take another step towards a real world setting, we investigate the learning of representations encoding 3D geometry. We discuss various methods to encode and learn about 3D scene structure, and present a model which simultaneously infers the geometry of a given scene, and segments it into objects.
We conclude by discussing future challenges and lessons learned. In particular, we touch on the challenge of modelling uncertainty when inferring 3D geometry, the tradeoffs between various data sources, and the cost of including model structure.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2023 | ||||
Autor(en): | Stelzner, Karl | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Elements of Unsupervised Scene Understanding: Objectives, Structures, and Modalities | ||||
Sprache: | Englisch | ||||
Referenten: | Kersting, Prof. Dr. Kristian ; Kosiorek, PhD Adam R. ; Vergari, Prof. Dr. Antonio | ||||
Publikationsjahr: | 13 Dezember 2023 | ||||
Ort: | Darmstadt | ||||
Kollation: | xv, 150 Seiten | ||||
Datum der mündlichen Prüfung: | 21 November 2023 | ||||
DOI: | 10.26083/tuprints-00026355 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/26355 | ||||
Kurzbeschreibung (Abstract): | Enabling robust interactions between automated systems and the real world is a major goal of artificial intelligence. A key ingredient towards this goal is scene understanding: the ability to process visual imagery into a concise representation of the depicted scene, including the identity, position, and geometry of objects. While supervised deep learning approaches have proven effective at processing visual inputs, the cost of supplying human annotations for training quickly becomes infeasible as the diversity of the inputs and the required level of detail increases, putting full real-world scene understanding out of reach. For this reason, this thesis investigates unsupervised methods to scene understanding. In particular, we utilize generative models with structured latent variables to facilitate the learning of object-based representations. We start our investigation in an autoencoding setting, where we highlight the capability of such systems to identify objects without human supervision, as well as the advantages of integrating tractable components within them. At the same time, we identify some limitations of this setting, which prevent success in more visually complex environments. Based on this, we then turn to video data, where we leverage the prediction of dynamics to both regularize the representation learning task and to enable applications to reinforcement learning. Finally, to take another step towards a real world setting, we investigate the learning of representations encoding 3D geometry. We discuss various methods to encode and learn about 3D scene structure, and present a model which simultaneously infers the geometry of a given scene, and segments it into objects. We conclude by discussing future challenges and lessons learned. In particular, we touch on the challenge of modelling uncertainty when inferring 3D geometry, the tradeoffs between various data sources, and the cost of including model structure. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Status: | Verlagsversion | ||||
URN: | urn:nbn:de:tuda-tuprints-263552 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Künstliche Intelligenz und Maschinelles Lernen |
||||
Hinterlegungsdatum: | 13 Dez 2023 13:02 | ||||
Letzte Änderung: | 15 Dez 2023 08:14 | ||||
PPN: | |||||
Referenten: | Kersting, Prof. Dr. Kristian ; Kosiorek, PhD Adam R. ; Vergari, Prof. Dr. Antonio | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 21 November 2023 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |