TU Darmstadt / ULB / TUbiblio

Automatic recognition of German news focusing on future-directed beliefs and intentions

Eckle-Kohler, Judith ; Kohler, Michael ; Mehnert, Jens (2008)
Automatic recognition of German news focusing on future-directed beliefs and intentions.
In: Computer Speech and Language, 22 (4)
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

We consider the classification of German news stories as either focusing on future-directed beliefs and intentions or lacking these. The method proposed in this article requires only a small set of labeled training data. Rather, we introduce German clues for the automatic identification of future-orientation which are used for automatic labeling of Reuters news stories. We describe the development of a high-precision procedure for automatic labeling in a bootstrapping fashion: A first version of the labeling procedure uses the absence of clues for future-directedness as indicator for non-future-directedness and is able to automatically label about one-third of the Reuters news stories with high precision. Then a perceptron is applied to the automatically labeled news stories in order to semi-automatically acquire an additional set of clues for non-future-directedness. The second version of the labeling procedure additionally uses these clues and achieves remarkably improved results in terms of recall; it can even be extended by a guessing step to perform classification with an error of 22.5%. We also investigate another way to increase the recall by using the automatically labeled news stories as training data for statistical classifiers. Three different types of statistical classifiers are applied in order to address the question, which classifier is most suited for the text classification task considered. The best statistical classifier combined with the results of improved automatic labeling is able to recognize the two classes of news stories with an error of 19%.

Typ des Eintrags: Artikel
Erschienen: 2008
Autor(en): Eckle-Kohler, Judith ; Kohler, Michael ; Mehnert, Jens
Art des Eintrags: Bibliographie
Titel: Automatic recognition of German news focusing on future-directed beliefs and intentions
Sprache: Englisch
Publikationsjahr: Oktober 2008
Titel der Zeitschrift, Zeitung oder Schriftenreihe: Computer Speech and Language
Jahrgang/Volume einer Zeitschrift: 22
(Heft-)Nummer: 4
URL / URN: https://www.sciencedirect.com/science/article/pii/S088523080...
Zugehörige Links:
Kurzbeschreibung (Abstract):

We consider the classification of German news stories as either focusing on future-directed beliefs and intentions or lacking these. The method proposed in this article requires only a small set of labeled training data. Rather, we introduce German clues for the automatic identification of future-orientation which are used for automatic labeling of Reuters news stories. We describe the development of a high-precision procedure for automatic labeling in a bootstrapping fashion: A first version of the labeling procedure uses the absence of clues for future-directedness as indicator for non-future-directedness and is able to automatically label about one-third of the Reuters news stories with high precision. Then a perceptron is applied to the automatically labeled news stories in order to semi-automatically acquire an additional set of clues for non-future-directedness. The second version of the labeling procedure additionally uses these clues and achieves remarkably improved results in terms of recall; it can even be extended by a guessing step to perform classification with an error of 22.5%. We also investigate another way to increase the recall by using the automatically labeled news stories as training data for statistical classifiers. Three different types of statistical classifiers are applied in order to address the question, which classifier is most suited for the text classification task considered. The best statistical classifier combined with the results of improved automatic labeling is able to recognize the two classes of news stories with an error of 19%.

ID-Nummer: TUD-CS-2008-11509
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 31 Dez 2016 14:29
Letzte Änderung: 26 Sep 2018 12:06
PPN:
Zugehörige Links:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen