Zauss, Duncan (2021)
Perceiving and Predicting Semantic Keypoints.
École polytechnique fédérale de Lausanne; Technische Universität Darmstadt
doi: 10.26083/tuprints-00019453
Masterarbeit, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
In this work the inherently ambiguous task of predicting 3D human poses from monocular RGB images is tackled. Two different approaches to achieve this goal are presented. Firstly, it is proposed to train a fully connected neural network to lift the 2D joint positions, that can be obtained with any off-the-shelf 2D human pose estimation algorithm, to 3D poses. Since 3D human pose datasets are limited and the joint locations of datasets for 2D human pose estimation and 3D human pose estimation often do not match, we create a synthetic ground truth. Through this mean, our model can learn to lift arbitrary sets of keypoints to 3D. Our experiments show that we achieve competitive results on the Human3.6M without using any of the Human3.6M training data. Secondly, we propose a new fully convolutional architecture that encodes 3D poses with composite fields. Our method learns 3D vectors that point from a central position of the human body to all of the humans joints in the 3D space. Our model achieves competitive results on the challenging 3D poses in the wild dataset. Furthermore, our model runs at 21 FPS which makes it real-time capable.
Typ des Eintrags: | Masterarbeit |
---|---|
Erschienen: | 2021 |
Autor(en): | Zauss, Duncan |
Art des Eintrags: | Erstveröffentlichung |
Titel: | Perceiving and Predicting Semantic Keypoints |
Sprache: | Englisch |
Referenten: | Schäfer, Prof. Dr. Michael ; Alahi, Prof. Dr. Alexandre |
Publikationsjahr: | 2021 |
Ort: | Darmstadt |
Kollation: | xi, 50, xxviii Seiten |
DOI: | 10.26083/tuprints-00019453 |
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/19453 |
Kurzbeschreibung (Abstract): | In this work the inherently ambiguous task of predicting 3D human poses from monocular RGB images is tackled. Two different approaches to achieve this goal are presented. Firstly, it is proposed to train a fully connected neural network to lift the 2D joint positions, that can be obtained with any off-the-shelf 2D human pose estimation algorithm, to 3D poses. Since 3D human pose datasets are limited and the joint locations of datasets for 2D human pose estimation and 3D human pose estimation often do not match, we create a synthetic ground truth. Through this mean, our model can learn to lift arbitrary sets of keypoints to 3D. Our experiments show that we achieve competitive results on the Human3.6M without using any of the Human3.6M training data. Secondly, we propose a new fully convolutional architecture that encodes 3D poses with composite fields. Our method learns 3D vectors that point from a central position of the human body to all of the humans joints in the 3D space. Our model achieves competitive results on the challenging 3D poses in the wild dataset. Furthermore, our model runs at 21 FPS which makes it real-time capable. |
Status: | Verlagsversion |
URN: | urn:nbn:de:tuda-tuprints-194534 |
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik |
Fachbereich(e)/-gebiet(e): | Studienbereiche Studienbereiche > Studienbereich Computational Engineering |
Hinterlegungsdatum: | 10 Sep 2021 12:13 |
Letzte Änderung: | 22 Nov 2023 11:05 |
PPN: | |
Referenten: | Schäfer, Prof. Dr. Michael ; Alahi, Prof. Dr. Alexandre |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |