Automatic Analysis of Flaws in Pre-Trained NLP Models

Eckart de Castilho, Richard (2016)
Automatic Analysis of Flaws in Pre-Trained NLP Models.
Osaka, Japan
Konferenzveröffentlichung, Bibliographie

URL / URN: http://www.aclweb.org/anthology/W16-5203

Kurzbeschreibung (Abstract)

Most tools for natural language processing (NLP) today are based on machine learning and come with pre-trained models. In addition, third-parties provide pre-trained models for popular NLP tools. The predictive power and accuracy of these tools depends on the quality of these models. Downstream researchers often base their results on pre-trained models instead of training their own. Consequently, pre-trained models are an essential resource to our community. However, to be best of our knowledge, no systematic study of pre-trained models has been conducted so far. This paper reports on the analysis of 274 pre-models for six NLP tools and four potential causes of problems: encoding, tokenization, normalization, and change over time. The analysis is implemented in the open source tool Model Investigator. Our work 1) allows model consumers to better assess whether a model is suitable for their task, 2) enables tool and model creators to sanity-check their models before distributing them, and 3) enables improvements in tool interoperability by performing automatic adjustments of normalization or other pre-processing based on the models used.

Typ des Eintrags:	Konferenzveröffentlichung
Erschienen:	2016
Autor(en):	Eckart de Castilho, Richard
Art des Eintrags:	Bibliographie
Titel:	Automatic Analysis of Flaws in Pre-Trained NLP Models
Sprache:	Englisch
Publikationsjahr:	Dezember 2016
Buchtitel:	Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI3nOIAF2) at COLING 2016
Veranstaltungsort:	Osaka, Japan
URL / URN:	http://www.aclweb.org/anthology/W16-5203
Kurzbeschreibung (Abstract):	Most tools for natural language processing (NLP) today are based on machine learning and come with pre-trained models. In addition, third-parties provide pre-trained models for popular NLP tools. The predictive power and accuracy of these tools depends on the quality of these models. Downstream researchers often base their results on pre-trained models instead of training their own. Consequently, pre-trained models are an essential resource to our community. However, to be best of our knowledge, no systematic study of pre-trained models has been conducted so far. This paper reports on the analysis of 274 pre-models for six NLP tools and four potential causes of problems: encoding, tokenization, normalization, and change over time. The analysis is implemented in the open source tool Model Investigator. Our work 1) allows model consumers to better assess whether a model is suitable for their task, 2) enables tool and model creators to sanity-check their models before distributing them, and 3) enables improvements in tool interoperability by performing automatic adjustments of normalization or other pre-processing based on the models used.
Freie Schlagworte:	CEDIFOR;UKP_s_DKPro_Core;UKP_p_DKPro;UKP_reviewed;UKP_p_OpenMinTeD
ID-Nummer:	TUD-CS-2016-14654
Fachbereich(e)/-gebiet(e):	20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung DFG-Graduiertenkollegs DFG-Graduiertenkollegs > Graduiertenkolleg 1994 Adaptive Informationsaufbereitung aus heterogenen Quellen
Hinterlegungsdatum:	31 Dez 2016 14:29
Letzte Änderung:	05 Okt 2018 09:02
PPN:
Export:

Suche nach Titel in:	TUfind oder in Google

Frage zum Eintrag

Optionen (nur für Redakteure)

Redaktionelle Details anzeigen

OAI 2.0-Basis-URL: https://tubiblio.ulb.tu-darmstadt.de/cgi/oai2 TUbiblio verwendet EPrints 3.

Drucken |

Impressum |

Datenschutzerklärung

Automatic Analysis of Flaws in Pre-Trained NLP Models

Eckart de Castilho, Richard (2016)Automatic Analysis of Flaws in Pre-Trained NLP Models. Osaka, JapanKonferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Eckart de Castilho, Richard (2016)
Automatic Analysis of Flaws in Pre-Trained NLP Models.
Osaka, Japan
Konferenzveröffentlichung, Bibliographie