TU Darmstadt / ULB / TUbiblio

Comparison of preprocessing approaches for text data in digital shop floor management systems

Müller, Marvin ; Longard, Lukas ; Metternich, Joachim (2022)
Comparison of preprocessing approaches for text data in digital shop floor management systems.
In: Procedia CIRP, 107
doi: 10.1016/j.procir.2022.04.030
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

In an increasing number of production companies shop floor management (SFM) is supported by digital systems. The data generated while working with these systems can be used for assistance systems to further enhance the value of digital SFM. Several assistance systems using text data from problem-solving processes have been suggested but had limited quality due to the domain specific language characteristics: short texts with spelling errors and the usage of synonyms. This research aims to quantify the improvement potentials of different preprocessing approaches on the quality of the assistance systems. For that and for comparison in the research community a public, labeled data set is needed. This paper introduces such a data set based on the characteristics identified in three real industry data sets. To overcome the problems in text processing of shop floor data (e.g. domain specific synonyms), several approaches are suggested, tested, and compared to a generic approach for text clustering. The study identifies best practices for the handling of shop floor text data and provides a data set with the goal of simplifying and stimulating research on this topic.

Typ des Eintrags: Artikel
Erschienen: 2022
Autor(en): Müller, Marvin ; Longard, Lukas ; Metternich, Joachim
Art des Eintrags: Bibliographie
Titel: Comparison of preprocessing approaches for text data in digital shop floor management systems
Sprache: Englisch
Publikationsjahr: 26 Mai 2022
Verlag: Elsevier B.V.
Titel der Zeitschrift, Zeitung oder Schriftenreihe: Procedia CIRP
Jahrgang/Volume einer Zeitschrift: 107
DOI: 10.1016/j.procir.2022.04.030
Kurzbeschreibung (Abstract):

In an increasing number of production companies shop floor management (SFM) is supported by digital systems. The data generated while working with these systems can be used for assistance systems to further enhance the value of digital SFM. Several assistance systems using text data from problem-solving processes have been suggested but had limited quality due to the domain specific language characteristics: short texts with spelling errors and the usage of synonyms. This research aims to quantify the improvement potentials of different preprocessing approaches on the quality of the assistance systems. For that and for comparison in the research community a public, labeled data set is needed. This paper introduces such a data set based on the characteristics identified in three real industry data sets. To overcome the problems in text processing of shop floor data (e.g. domain specific synonyms), several approaches are suggested, tested, and compared to a generic approach for text clustering. The study identifies best practices for the handling of shop floor text data and provides a data set with the goal of simplifying and stimulating research on this topic.

Freie Schlagworte: Data quality improvement, natural language processing, text mining
Fachbereich(e)/-gebiet(e): 16 Fachbereich Maschinenbau
16 Fachbereich Maschinenbau > Institut für Produktionsmanagement und Werkzeugmaschinen (PTW)
Hinterlegungsdatum: 29 Jul 2022 06:56
Letzte Änderung: 06 Okt 2022 08:35
PPN: 497712326
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen