TU Darmstadt / ULB / TUbiblio

Perceiving and Predicting Semantic Keypoints

Zauss, Duncan (2021)
Perceiving and Predicting Semantic Keypoints.
École polytechnique fédérale de Lausanne; Technische Universität Darmstadt
doi: 10.26083/tuprints-00019453
Masterarbeit, Erstveröffentlichung, Verlagsversion

Kurzbeschreibung (Abstract)

In this work the inherently ambiguous task of predicting 3D human poses from monocular RGB images is tackled. Two different approaches to achieve this goal are presented. Firstly, it is proposed to train a fully connected neural network to lift the 2D joint positions, that can be obtained with any off-the-shelf 2D human pose estimation algorithm, to 3D poses. Since 3D human pose datasets are limited and the joint locations of datasets for 2D human pose estimation and 3D human pose estimation often do not match, we create a synthetic ground truth. Through this mean, our model can learn to lift arbitrary sets of keypoints to 3D. Our experiments show that we achieve competitive results on the Human3.6M without using any of the Human3.6M training data. Secondly, we propose a new fully convolutional architecture that encodes 3D poses with composite fields. Our method learns 3D vectors that point from a central position of the human body to all of the humans joints in the 3D space. Our model achieves competitive results on the challenging 3D poses in the wild dataset. Furthermore, our model runs at 21 FPS which makes it real-time capable.

Typ des Eintrags: Masterarbeit
Erschienen: 2021
Autor(en): Zauss, Duncan
Art des Eintrags: Erstveröffentlichung
Titel: Perceiving and Predicting Semantic Keypoints
Sprache: Englisch
Referenten: Schäfer, Prof. Dr. Michael ; Alahi, Prof. Dr. Alexandre
Publikationsjahr: 2021
Ort: Darmstadt
Kollation: xi, 50, xxviii Seiten
DOI: 10.26083/tuprints-00019453
URL / URN: https://tuprints.ulb.tu-darmstadt.de/19453
Kurzbeschreibung (Abstract):

In this work the inherently ambiguous task of predicting 3D human poses from monocular RGB images is tackled. Two different approaches to achieve this goal are presented. Firstly, it is proposed to train a fully connected neural network to lift the 2D joint positions, that can be obtained with any off-the-shelf 2D human pose estimation algorithm, to 3D poses. Since 3D human pose datasets are limited and the joint locations of datasets for 2D human pose estimation and 3D human pose estimation often do not match, we create a synthetic ground truth. Through this mean, our model can learn to lift arbitrary sets of keypoints to 3D. Our experiments show that we achieve competitive results on the Human3.6M without using any of the Human3.6M training data. Secondly, we propose a new fully convolutional architecture that encodes 3D poses with composite fields. Our method learns 3D vectors that point from a central position of the human body to all of the humans joints in the 3D space. Our model achieves competitive results on the challenging 3D poses in the wild dataset. Furthermore, our model runs at 21 FPS which makes it real-time capable.

Status: Verlagsversion
URN: urn:nbn:de:tuda-tuprints-194534
Sachgruppe der Dewey Dezimalklassifikatin (DDC): 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fachbereich(e)/-gebiet(e): Studienbereiche
Studienbereiche > Studienbereich Computational Engineering
Hinterlegungsdatum: 10 Sep 2021 12:13
Letzte Änderung: 22 Nov 2023 11:05
PPN:
Referenten: Schäfer, Prof. Dr. Michael ; Alahi, Prof. Dr. Alexandre
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen