Hättasch, Benjamin (2017):
Automated Ontology Refinement Using Compression-Based Learning.
TU Darmstadt, [Master Thesis]
Abstract
In this thesis, we propose an approach to refine ontologies for a given domain based on training corpora. We use the Minimum Description Length principle to assess the fit between ontology and text and to identify suitable refinement operations.
For that we need to calculate a score which is based on finding a representation of the text using the ontology. We propose restrictions to the search space and introduce heuristic functions to find the representation in a reasonable amount of time. More heuristics are suggested to find modifications that improve the fit without the need to try every possible operation. We implement a framework for the refining process that contains a couple of refinement operations and can easily be extended with others.
The functionality of the approach as well as the correctness of the implementation is tested with an extensive series of experiments. Synthetic data is used to confirm our hypotheses, afterwards the algorithms are applied to real data. We can also show that our system copes with large corpora containing millions of words. The resulting ontologies are evaluated using well-known metrics from ontology engineering. They could then be used in all kinds of approaches for natural language processing depending on ontologies.
Additionally, we show how parts of our system can be used to solve tasks from natural language processing directly. We suggest a way how the theoretic foundation of it can be used in classification tasks and show a practical application for such a task, namely semantic topic detection.
Item Type: | Master Thesis | ||||
---|---|---|---|---|---|
Erschienen: | 2017 | ||||
Creators: | Hättasch, Benjamin | ||||
Title: | Automated Ontology Refinement Using Compression-Based Learning | ||||
Language: | English | ||||
Abstract: | In this thesis, we propose an approach to refine ontologies for a given domain based on training corpora. We use the Minimum Description Length principle to assess the fit between ontology and text and to identify suitable refinement operations. For that we need to calculate a score which is based on finding a representation of the text using the ontology. We propose restrictions to the search space and introduce heuristic functions to find the representation in a reasonable amount of time. More heuristics are suggested to find modifications that improve the fit without the need to try every possible operation. We implement a framework for the refining process that contains a couple of refinement operations and can easily be extended with others. The functionality of the approach as well as the correctness of the implementation is tested with an extensive series of experiments. Synthetic data is used to confirm our hypotheses, afterwards the algorithms are applied to real data. We can also show that our system copes with large corpora containing millions of words. The resulting ontologies are evaluated using well-known metrics from ontology engineering. They could then be used in all kinds of approaches for natural language processing depending on ontologies. Additionally, we show how parts of our system can be used to solve tasks from natural language processing directly. We suggest a way how the theoretic foundation of it can be used in classification tasks and show a practical application for such a task, namely semantic topic detection. |
||||
Divisions: | 20 Department of Computer Science | ||||
Date Deposited: | 04 Dec 2017 | ||||
Referees: | Fürnkranz, Prof.Dr. Johannes and Vreeken, Dr. Jilles | ||||
Alternative Abstract: |
|
||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
![]() |
Send an inquiry |
Options (only for editors)
![]() |
Show editorial Details |