Shi, Meiling ; Hoffmann, André ; Rüppel, Uwe
Hrsg.: Semenov, Vitaly ; Scherer, Raimar J. (2021)
Applying weak supervision to classify scarce labeled technical documents.
ECPPM 2021 – eWork and eBusiness in Architecture, Engineering and Construction. Moscow, Russia (15.09.2021-17.09.2021)
doi: 10.1201/9781003191476-31
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
The digitalization in the construction industry, the number of project relevant documents. It becomes a challenge to organize documents in a searchable manner by classification. The German Waterways and Shipping Administration (WSV) is one of the organizations facing this problem. Manually classifying is due to the considerable expense nearly impossible. In parallel, text classification with machine learning increasingly draws attention. Classification belongs to supervised machine learning, where large labeled data samples are needed. In the filing system used in WSV, only a small amount of data with ground-truth labels are available. It is tedious and expensive to annotate manually. To solve the shortage of training data, we propose applying weakly supervised learning, where noisy and inexact labels can be used in the training process. In this study, we inject the domain knowledge in the training process with weak supervision framework Snorkel to construct a labeling model that programmatically annotates data. We then trained classifiers on the original dataset together with the dataset annoted by the labeling model. The results show that even though the programmatically annoted dataset is noisy, it can still train a generalized classifier and improve the classifiers’ performance.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2021 |
Herausgeber: | Semenov, Vitaly ; Scherer, Raimar J. |
Autor(en): | Shi, Meiling ; Hoffmann, André ; Rüppel, Uwe |
Art des Eintrags: | Bibliographie |
Titel: | Applying weak supervision to classify scarce labeled technical documents |
Sprache: | Englisch |
Publikationsjahr: | 17 September 2021 |
Ort: | London |
Verlag: | CRC Press |
Buchtitel: | ECPPM 2021 – eWork and eBusiness in Architecture, Engineering and Construction: Proceedings of the 13th European Conference on Product & Process Modelling 2021 |
Veranstaltungstitel: | ECPPM 2021 – eWork and eBusiness in Architecture, Engineering and Construction |
Veranstaltungsort: | Moscow, Russia |
Veranstaltungsdatum: | 15.09.2021-17.09.2021 |
DOI: | 10.1201/9781003191476-31 |
URL / URN: | https://www.taylorfrancis.com/chapters/edit/10.1201/97810031... |
Kurzbeschreibung (Abstract): | The digitalization in the construction industry, the number of project relevant documents. It becomes a challenge to organize documents in a searchable manner by classification. The German Waterways and Shipping Administration (WSV) is one of the organizations facing this problem. Manually classifying is due to the considerable expense nearly impossible. In parallel, text classification with machine learning increasingly draws attention. Classification belongs to supervised machine learning, where large labeled data samples are needed. In the filing system used in WSV, only a small amount of data with ground-truth labels are available. It is tedious and expensive to annotate manually. To solve the shortage of training data, we propose applying weakly supervised learning, where noisy and inexact labels can be used in the training process. In this study, we inject the domain knowledge in the training process with weak supervision framework Snorkel to construct a labeling model that programmatically annotates data. We then trained classifiers on the original dataset together with the dataset annoted by the labeling model. The results show that even though the programmatically annoted dataset is noisy, it can still train a generalized classifier and improve the classifiers’ performance. |
Freie Schlagworte: | Schwaches überwachtes Lernen, Textklassifizierung, Maschinelles Lernen |
Fachbereich(e)/-gebiet(e): | 13 Fachbereich Bau- und Umweltingenieurwissenschaften 13 Fachbereich Bau- und Umweltingenieurwissenschaften > Institut für Numerische Methoden und Informatik im Bauwesen |
Hinterlegungsdatum: | 03 Mär 2023 06:36 |
Letzte Änderung: | 20 Apr 2023 08:01 |
PPN: | 507179919 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |