TU Darmstadt / ULB / TUbiblio

Teaching "Unstructured Information Management: Theory and Applications" to Computational Linguistics Students

Gurevych, Iryna and Müller, Christof and Zesch, Torsten (2007):
Teaching "Unstructured Information Management: Theory and Applications" to Computational Linguistics Students.
In: Proceedings of the First Workshop on Unstructured Information Management Architecture at Biannual Conference of the Society for Computational Linguistics and Language Technology, [Online-Edition: https://public.ukp.informatik.tu-darmstadt.de/UKP_Webpage/pu...],
[Conference or Workshop Item]

Abstract

Students in Computational Linguistics often lack experience in building robust and scalable software components. Thus, student projects tend to be unstable and to work only under very special preconditions (e.g., a project has to be installed in a certain directory, or handles only single files instead of whole directories). Furthermore, if students have to build a system from scratch, they have to concentrate on input and output issues, as well as connecting numerous preprocessing components that were not designed to work together. This limits the scope of feasible course tasks to relatively simple ones like implementing yet another tokenizer. When offering the course “Unstructured Information Management: Theory and Applications 1 as part of the B.A./M.A. program of International Studies in Computational Linguistics at the University of Tübingen, our motivation was to familiarize students with fundamental concepts in unstructured information management and Natural Language Processing (NLP) middleware. This should enable students of computational linguistics to work on more challenging tasks, and to gain first experiences with building complex software systems. The course goals were supported by providing basic preprocessing components like a tokenizer or a PoSTagger on the basis of the Unstructured Information Management Architecture (UIMA) (Ferrucci and Lally, 2004). Thus, students of computational linguistics can concentrate on their core competence and work on more challenging tasks both in terms of theoretical complex- 1http://www.ukp.tu-darmstadt.de/teaching/ ws0607/UIMseminar ity and industrial relevance. As a side effect, components developed in the course are robust and scalable, which enables re-use by the research community.2 UIMA allows us to shift the focus from software engineering to research relevant tasks, like thorough evaluation of the projects.

Item Type: Conference or Workshop Item
Erschienen: 2007
Creators: Gurevych, Iryna and Müller, Christof and Zesch, Torsten
Title: Teaching "Unstructured Information Management: Theory and Applications" to Computational Linguistics Students
Language: German
Abstract:

Students in Computational Linguistics often lack experience in building robust and scalable software components. Thus, student projects tend to be unstable and to work only under very special preconditions (e.g., a project has to be installed in a certain directory, or handles only single files instead of whole directories). Furthermore, if students have to build a system from scratch, they have to concentrate on input and output issues, as well as connecting numerous preprocessing components that were not designed to work together. This limits the scope of feasible course tasks to relatively simple ones like implementing yet another tokenizer. When offering the course “Unstructured Information Management: Theory and Applications 1 as part of the B.A./M.A. program of International Studies in Computational Linguistics at the University of Tübingen, our motivation was to familiarize students with fundamental concepts in unstructured information management and Natural Language Processing (NLP) middleware. This should enable students of computational linguistics to work on more challenging tasks, and to gain first experiences with building complex software systems. The course goals were supported by providing basic preprocessing components like a tokenizer or a PoSTagger on the basis of the Unstructured Information Management Architecture (UIMA) (Ferrucci and Lally, 2004). Thus, students of computational linguistics can concentrate on their core competence and work on more challenging tasks both in terms of theoretical complex- 1http://www.ukp.tu-darmstadt.de/teaching/ ws0607/UIMseminar ity and industrial relevance. As a side effect, components developed in the course are robust and scalable, which enables re-use by the research community.2 UIMA allows us to shift the focus from software engineering to research relevant tasks, like thorough evaluation of the projects.

Title of Book: Proceedings of the First Workshop on Unstructured Information Management Architecture at Biannual Conference of the Society for Computational Linguistics and Language Technology
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Telecooperation
20 Department of Computer Science > Ubiquitous Knowledge Processing
Date Deposited: 31 Dec 2016 12:59
Official URL: https://public.ukp.informatik.tu-darmstadt.de/UKP_Webpage/pu...
Identification Number: GurevychEtal2007teaching
Export:
Suche nach Titel in: TUfind oder in Google

Optionen (nur für Redakteure)

View Item View Item