Hättasch, Benjamin (2017)
Automated Ontology Refinement Using Compression-Based Learning.
Technische Universität Darmstadt
Masterarbeit, Bibliographie
Kurzbeschreibung (Abstract)
In this thesis, we propose an approach to refine ontologies for a given domain based on training corpora. We use the Minimum Description Length principle to assess the fit between ontology and text and to identify suitable refinement operations.
For that we need to calculate a score which is based on finding a representation of the text using the ontology. We propose restrictions to the search space and introduce heuristic functions to find the representation in a reasonable amount of time. More heuristics are suggested to find modifications that improve the fit without the need to try every possible operation. We implement a framework for the refining process that contains a couple of refinement operations and can easily be extended with others.
The functionality of the approach as well as the correctness of the implementation is tested with an extensive series of experiments. Synthetic data is used to confirm our hypotheses, afterwards the algorithms are applied to real data. We can also show that our system copes with large corpora containing millions of words. The resulting ontologies are evaluated using well-known metrics from ontology engineering. They could then be used in all kinds of approaches for natural language processing depending on ontologies.
Additionally, we show how parts of our system can be used to solve tasks from natural language processing directly. We suggest a way how the theoretic foundation of it can be used in classification tasks and show a practical application for such a task, namely semantic topic detection.
Typ des Eintrags: | Masterarbeit | ||||
---|---|---|---|---|---|
Erschienen: | 2017 | ||||
Autor(en): | Hättasch, Benjamin | ||||
Art des Eintrags: | Bibliographie | ||||
Titel: | Automated Ontology Refinement Using Compression-Based Learning | ||||
Sprache: | Englisch | ||||
Referenten: | Fürnkranz, Prof.Dr. Johannes ; Vreeken, Dr. Jilles | ||||
Publikationsjahr: | 4 Dezember 2017 | ||||
Kurzbeschreibung (Abstract): | In this thesis, we propose an approach to refine ontologies for a given domain based on training corpora. We use the Minimum Description Length principle to assess the fit between ontology and text and to identify suitable refinement operations. For that we need to calculate a score which is based on finding a representation of the text using the ontology. We propose restrictions to the search space and introduce heuristic functions to find the representation in a reasonable amount of time. More heuristics are suggested to find modifications that improve the fit without the need to try every possible operation. We implement a framework for the refining process that contains a couple of refinement operations and can easily be extended with others. The functionality of the approach as well as the correctness of the implementation is tested with an extensive series of experiments. Synthetic data is used to confirm our hypotheses, afterwards the algorithms are applied to real data. We can also show that our system copes with large corpora containing millions of words. The resulting ontologies are evaluated using well-known metrics from ontology engineering. They could then be used in all kinds of approaches for natural language processing depending on ontologies. Additionally, we show how parts of our system can be used to solve tasks from natural language processing directly. We suggest a way how the theoretic foundation of it can be used in classification tasks and show a practical application for such a task, namely semantic topic detection. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik | ||||
Hinterlegungsdatum: | 04 Dez 2017 | ||||
Letzte Änderung: | 29 Apr 2019 12:18 | ||||
PPN: | |||||
Referenten: | Fürnkranz, Prof.Dr. Johannes ; Vreeken, Dr. Jilles | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |