TU Darmstadt / ULB / TUbiblio

Retrieve information from construction documents with BERT and unsupervised learning

Shi, Meiling ; Heinz, Tobias ; Rüppel, Uwe
Hrsg.: Scherer, Raimar (2023)
Retrieve information from construction documents with BERT and unsupervised learning.
14th European Conference on Product & Process Modelling (ECPPM 2022). Trondheim, Norway (September 2022)
doi: 10.1201/9781003354222-51
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

The exploitation of using text documents from precedent projects for decision-making in the construction industry is still at a low level. One reason is that the in unstructured natural language formulated information cannot be processed directly by computer programs and the search is conducted by keywordsmatch, which is inefficient and imprecise. To make the information of unstructured text document accessible in digital processes without introducing additional manual work, we propose using natural language processing and unsupervised learning methods to automatedly extract information from unstructured textual documents. This paper describes an NLP-based pipeline that includes methods for data acquisition and preprocessing, different transformer-based embedding methods, and subsequent downstream tasks. Our proof-of-concept is trained on documents from different waterways construction projects in the German language. Because of the unsupervised learning and available language models, this pipeline can be generalized to other languages and construction types.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2023
Herausgeber: Scherer, Raimar
Autor(en): Shi, Meiling ; Heinz, Tobias ; Rüppel, Uwe
Art des Eintrags: Bibliographie
Titel: Retrieve information from construction documents with BERT and unsupervised learning
Sprache: Englisch
Publikationsjahr: März 2023
Ort: London
Verlag: CRC Press
Buchtitel: ECPPM 2022 - eWork and eBusiness in Architecture, Engineering and Construction 2022
Veranstaltungstitel: 14th European Conference on Product & Process Modelling (ECPPM 2022)
Veranstaltungsort: Trondheim, Norway
Veranstaltungsdatum: September 2022
Auflage: 1st edition
DOI: 10.1201/9781003354222-51
URL / URN: https://www.taylorfrancis.com/books/9781003354222/chapters/1...
Kurzbeschreibung (Abstract):

The exploitation of using text documents from precedent projects for decision-making in the construction industry is still at a low level. One reason is that the in unstructured natural language formulated information cannot be processed directly by computer programs and the search is conducted by keywordsmatch, which is inefficient and imprecise. To make the information of unstructured text document accessible in digital processes without introducing additional manual work, we propose using natural language processing and unsupervised learning methods to automatedly extract information from unstructured textual documents. This paper describes an NLP-based pipeline that includes methods for data acquisition and preprocessing, different transformer-based embedding methods, and subsequent downstream tasks. Our proof-of-concept is trained on documents from different waterways construction projects in the German language. Because of the unsupervised learning and available language models, this pipeline can be generalized to other languages and construction types.

Fachbereich(e)/-gebiet(e): 13 Fachbereich Bau- und Umweltingenieurwissenschaften
13 Fachbereich Bau- und Umweltingenieurwissenschaften > Institut für Numerische Methoden und Informatik im Bauwesen
Hinterlegungsdatum: 03 Nov 2023 10:41
Letzte Änderung: 03 Nov 2023 10:42
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen