Fritz, Mario (2009)
Modeling, Representing and Learning of Visual Categories.
Technische Universität Darmstadt
Dissertation, Erstveröffentlichung
Kurzbeschreibung (Abstract)
This thesis is concerned with the modeling, representing and learning of visual categories for the purpose of automatic recognition and detection of objects in image data. The application area of such methods ranges from image-based retrieval over driver assistance systems for the automotive industry to applications in robotics. Despite the exciting progress that has been achieved in the field of visual object categorization over the last 5 years, we have still a long way to go to measure up to the perceptual capabilities of humans. While humans can recognize far beyond 10000 categories, machines can nowadays recognize only close to 300 categories with moderate accuracy in constraint settings. For more complex tasks the number of categories is a magnitude lower. Existing approaches reveal a surprising diversity in the way how they model, represent and learn visual categories. To a large extend, this diversity is a result of the different scenarios and categories investigated in the literature. This motivated us to develop methods that combine capabilities of previous methods along these 3 axes: Modeling, Representing and Learning. The resulting approaches turn out to be more adaptive and show better performance in recognition and detection tasks on standard datasets. Therefore, the scientific contribution of this thesis is structured into 3 parts: Combination of different modeling paradigms: One basic difference in modeling is, whether a method models the similarities within one category or the differences with respect to other categories. Since both views have their assets and drawbacks, we have developed a hybrid approach that successfully combines the strength of both approaches. Combination of different learning paradigms: While supervised approaches typically tend to have better performance, the high annotation efforts poses a big obstacle towards a larger number of recognizable categories. Unsupervised methods in combination with the overwhelming amount of data at hand (e.g. internet search) constitute an appealing alternative. Given this background we developed a method which makes use of different levels of supervision and consequently achieves better performance by considering unannotated data. Combination of different representation paradigms: Previous approaches differ strongly in the way they represent visual information. Representations range from local structures over line segments to global silhouettes. We present an approach that learns an effective representation directly from the image data and thereby extracts structures that combine the mentioned representation paradigms in a single approach.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2009 | ||||
Autor(en): | Fritz, Mario | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Modeling, Representing and Learning of Visual Categories | ||||
Sprache: | Englisch | ||||
Referenten: | Schiele, Prof. Bernt ; Perona, Prof. Pietro | ||||
Publikationsjahr: | 16 Juni 2009 | ||||
Ort: | Darmstadt | ||||
Verlag: | Technische Universität | ||||
Datum der mündlichen Prüfung: | 8 August 2008 | ||||
URL / URN: | urn:nbn:de:tuda-tuprints-14046 | ||||
Kurzbeschreibung (Abstract): | This thesis is concerned with the modeling, representing and learning of visual categories for the purpose of automatic recognition and detection of objects in image data. The application area of such methods ranges from image-based retrieval over driver assistance systems for the automotive industry to applications in robotics. Despite the exciting progress that has been achieved in the field of visual object categorization over the last 5 years, we have still a long way to go to measure up to the perceptual capabilities of humans. While humans can recognize far beyond 10000 categories, machines can nowadays recognize only close to 300 categories with moderate accuracy in constraint settings. For more complex tasks the number of categories is a magnitude lower. Existing approaches reveal a surprising diversity in the way how they model, represent and learn visual categories. To a large extend, this diversity is a result of the different scenarios and categories investigated in the literature. This motivated us to develop methods that combine capabilities of previous methods along these 3 axes: Modeling, Representing and Learning. The resulting approaches turn out to be more adaptive and show better performance in recognition and detection tasks on standard datasets. Therefore, the scientific contribution of this thesis is structured into 3 parts: Combination of different modeling paradigms: One basic difference in modeling is, whether a method models the similarities within one category or the differences with respect to other categories. Since both views have their assets and drawbacks, we have developed a hybrid approach that successfully combines the strength of both approaches. Combination of different learning paradigms: While supervised approaches typically tend to have better performance, the high annotation efforts poses a big obstacle towards a larger number of recognizable categories. Unsupervised methods in combination with the overwhelming amount of data at hand (e.g. internet search) constitute an appealing alternative. Given this background we developed a method which makes use of different levels of supervision and consequently achieves better performance by considering unannotated data. Combination of different representation paradigms: Previous approaches differ strongly in the way they represent visual information. Representations range from local structures over line segments to global silhouettes. We present an approach that learns an effective representation directly from the image data and thereby extracts structures that combine the mentioned representation paradigms in a single approach. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Freie Schlagworte: | computer vision, object recognition, object detection, machine learning, visual categorization | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Multimodale Interaktive Systeme |
||||
Hinterlegungsdatum: | 24 Jun 2009 10:48 | ||||
Letzte Änderung: | 26 Aug 2018 21:25 | ||||
PPN: | |||||
Referenten: | Schiele, Prof. Bernt ; Perona, Prof. Pietro | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 8 August 2008 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |