Müller, Marvin ; Longard, Lukas ; Metternich, Joachim (2022)
Comparison of preprocessing approaches for text data in digital shop floor management systems.
In: Procedia CIRP, 107
doi: 10.1016/j.procir.2022.04.030
Artikel, Bibliographie
Kurzbeschreibung (Abstract)
In an increasing number of production companies shop floor management (SFM) is supported by digital systems. The data generated while working with these systems can be used for assistance systems to further enhance the value of digital SFM. Several assistance systems using text data from problem-solving processes have been suggested but had limited quality due to the domain specific language characteristics: short texts with spelling errors and the usage of synonyms. This research aims to quantify the improvement potentials of different preprocessing approaches on the quality of the assistance systems. For that and for comparison in the research community a public, labeled data set is needed. This paper introduces such a data set based on the characteristics identified in three real industry data sets. To overcome the problems in text processing of shop floor data (e.g. domain specific synonyms), several approaches are suggested, tested, and compared to a generic approach for text clustering. The study identifies best practices for the handling of shop floor text data and provides a data set with the goal of simplifying and stimulating research on this topic.
Typ des Eintrags: | Artikel |
---|---|
Erschienen: | 2022 |
Autor(en): | Müller, Marvin ; Longard, Lukas ; Metternich, Joachim |
Art des Eintrags: | Bibliographie |
Titel: | Comparison of preprocessing approaches for text data in digital shop floor management systems |
Sprache: | Englisch |
Publikationsjahr: | 26 Mai 2022 |
Verlag: | Elsevier B.V. |
Titel der Zeitschrift, Zeitung oder Schriftenreihe: | Procedia CIRP |
Jahrgang/Volume einer Zeitschrift: | 107 |
DOI: | 10.1016/j.procir.2022.04.030 |
Kurzbeschreibung (Abstract): | In an increasing number of production companies shop floor management (SFM) is supported by digital systems. The data generated while working with these systems can be used for assistance systems to further enhance the value of digital SFM. Several assistance systems using text data from problem-solving processes have been suggested but had limited quality due to the domain specific language characteristics: short texts with spelling errors and the usage of synonyms. This research aims to quantify the improvement potentials of different preprocessing approaches on the quality of the assistance systems. For that and for comparison in the research community a public, labeled data set is needed. This paper introduces such a data set based on the characteristics identified in three real industry data sets. To overcome the problems in text processing of shop floor data (e.g. domain specific synonyms), several approaches are suggested, tested, and compared to a generic approach for text clustering. The study identifies best practices for the handling of shop floor text data and provides a data set with the goal of simplifying and stimulating research on this topic. |
Freie Schlagworte: | Data quality improvement, natural language processing, text mining |
Fachbereich(e)/-gebiet(e): | 16 Fachbereich Maschinenbau 16 Fachbereich Maschinenbau > Institut für Produktionsmanagement und Werkzeugmaschinen (PTW) |
Hinterlegungsdatum: | 29 Jul 2022 06:56 |
Letzte Änderung: | 06 Okt 2022 08:35 |
PPN: | 497712326 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |