TU Darmstadt / ULB / TUbiblio

Applying weak supervision to classify scarce labeled technical documents

Shi, Meiling ; Hoffmann, André ; Rüppel, Uwe
Hrsg.: Semenov, Vitaly ; Scherer, Raimar J. (2021)
Applying weak supervision to classify scarce labeled technical documents.
ECPPM 2021 – eWork and eBusiness in Architecture, Engineering and Construction. Moscow, Russia (15.09.2021-17.09.2021)
doi: 10.1201/9781003191476-31
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

The digitalization in the construction industry, the number of project relevant documents. It becomes a challenge to organize documents in a searchable manner by classification. The German Waterways and Shipping Administration (WSV) is one of the organizations facing this problem. Manually classifying is due to the considerable expense nearly impossible. In parallel, text classification with machine learning increasingly draws attention. Classification belongs to supervised machine learning, where large labeled data samples are needed. In the filing system used in WSV, only a small amount of data with ground-truth labels are available. It is tedious and expensive to annotate manually. To solve the shortage of training data, we propose applying weakly supervised learning, where noisy and inexact labels can be used in the training process. In this study, we inject the domain knowledge in the training process with weak supervision framework Snorkel to construct a labeling model that programmatically annotates data. We then trained classifiers on the original dataset together with the dataset annoted by the labeling model. The results show that even though the programmatically annoted dataset is noisy, it can still train a generalized classifier and improve the classifiers’ performance.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2021
Herausgeber: Semenov, Vitaly ; Scherer, Raimar J.
Autor(en): Shi, Meiling ; Hoffmann, André ; Rüppel, Uwe
Art des Eintrags: Bibliographie
Titel: Applying weak supervision to classify scarce labeled technical documents
Sprache: Englisch
Publikationsjahr: 17 September 2021
Ort: London
Verlag: CRC Press
Buchtitel: ECPPM 2021 – eWork and eBusiness in Architecture, Engineering and Construction: Proceedings of the 13th European Conference on Product & Process Modelling 2021
Veranstaltungstitel: ECPPM 2021 – eWork and eBusiness in Architecture, Engineering and Construction
Veranstaltungsort: Moscow, Russia
Veranstaltungsdatum: 15.09.2021-17.09.2021
DOI: 10.1201/9781003191476-31
URL / URN: https://www.taylorfrancis.com/chapters/edit/10.1201/97810031...
Kurzbeschreibung (Abstract):

The digitalization in the construction industry, the number of project relevant documents. It becomes a challenge to organize documents in a searchable manner by classification. The German Waterways and Shipping Administration (WSV) is one of the organizations facing this problem. Manually classifying is due to the considerable expense nearly impossible. In parallel, text classification with machine learning increasingly draws attention. Classification belongs to supervised machine learning, where large labeled data samples are needed. In the filing system used in WSV, only a small amount of data with ground-truth labels are available. It is tedious and expensive to annotate manually. To solve the shortage of training data, we propose applying weakly supervised learning, where noisy and inexact labels can be used in the training process. In this study, we inject the domain knowledge in the training process with weak supervision framework Snorkel to construct a labeling model that programmatically annotates data. We then trained classifiers on the original dataset together with the dataset annoted by the labeling model. The results show that even though the programmatically annoted dataset is noisy, it can still train a generalized classifier and improve the classifiers’ performance.

Freie Schlagworte: Schwaches überwachtes Lernen, Textklassifizierung, Maschinelles Lernen
Fachbereich(e)/-gebiet(e): 13 Fachbereich Bau- und Umweltingenieurwissenschaften
13 Fachbereich Bau- und Umweltingenieurwissenschaften > Institut für Numerische Methoden und Informatik im Bauwesen
Hinterlegungsdatum: 03 Mär 2023 06:36
Letzte Änderung: 20 Apr 2023 08:01
PPN: 507179919
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen