TU Darmstadt / ULB / TUbiblio

Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning

Chegini, Mohammad ; Bernard, Jürgen ; Berger, Philip ; Sourin, Alexei ; Andrews, Keith ; Schreck, Tobias (2019)
Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning.
In: Visual Informatics, 3 (1)
doi: 10.1016/j.visinf.2019.03.002
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

Supervised machine learning techniques require labelled multivariate training datasets. Many approaches address the issue of unlabelled datasets by tightly coupling machine learning algorithmswith interactive visualisations. Using appropriate techniques, analysts can play an active role in ahighly interactive and iterative machine learning process to label the dataset and create meaningfulpartitions. While this principle has been implemented either for unsupervised, semi-supervised, orsupervised machine learning tasks, the combination of all three methodologies remains challenging.In this paper, a visual analytics approach is presented, combining a variety of machine learningcapabilities with four linked visualisation views, all integrated within the mVis (multivariate Visualiser)system. The available palette of techniques allows an analyst to perform exploratory data analysis ona multivariate dataset and divide it into meaningful labelled partitions, from which a classifier canbe built. In the workflow, the analyst can label interesting patterns or outliers in a semi-supervisedprocess supported by active learning. Once a dataset has been interactively labelled, the analyst cancontinue the workflow with supervised machine learning to assess to what degree the subsequentclassifier has effectively learned the concepts expressed in the labelled training dataset. Using a noveltechnique called automatic dimension selection, interactions the analyst had with dimensions of themultivariate dataset are used to steer the machine learning algorithms.A real-world football dataset is used to show the utility of mVis for a series of analysis and labellingtasks, from initial labelling through iterations of data exploration, clustering, classification, and activelearning to refine the named partitions, to finally producing a high-quality labelled training datasetsuitable for training a classifier. The tool empowers the analyst with interactive visualisations includingscatterplots, parallel coordinates, similarity maps for records, and a new similarity map for partitions.

Typ des Eintrags: Artikel
Erschienen: 2019
Autor(en): Chegini, Mohammad ; Bernard, Jürgen ; Berger, Philip ; Sourin, Alexei ; Andrews, Keith ; Schreck, Tobias
Art des Eintrags: Bibliographie
Titel: Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning
Sprache: Englisch
Publikationsjahr: März 2019
Verlag: Elsevier ScienceDirect
Titel der Zeitschrift, Zeitung oder Schriftenreihe: Visual Informatics
Jahrgang/Volume einer Zeitschrift: 3
(Heft-)Nummer: 1
DOI: 10.1016/j.visinf.2019.03.002
URL / URN: https://doi.org/10.1016/j.visinf.2019.03.002
Kurzbeschreibung (Abstract):

Supervised machine learning techniques require labelled multivariate training datasets. Many approaches address the issue of unlabelled datasets by tightly coupling machine learning algorithmswith interactive visualisations. Using appropriate techniques, analysts can play an active role in ahighly interactive and iterative machine learning process to label the dataset and create meaningfulpartitions. While this principle has been implemented either for unsupervised, semi-supervised, orsupervised machine learning tasks, the combination of all three methodologies remains challenging.In this paper, a visual analytics approach is presented, combining a variety of machine learningcapabilities with four linked visualisation views, all integrated within the mVis (multivariate Visualiser)system. The available palette of techniques allows an analyst to perform exploratory data analysis ona multivariate dataset and divide it into meaningful labelled partitions, from which a classifier canbe built. In the workflow, the analyst can label interesting patterns or outliers in a semi-supervisedprocess supported by active learning. Once a dataset has been interactively labelled, the analyst cancontinue the workflow with supervised machine learning to assess to what degree the subsequentclassifier has effectively learned the concepts expressed in the labelled training dataset. Using a noveltechnique called automatic dimension selection, interactions the analyst had with dimensions of themultivariate dataset are used to steer the machine learning algorithms.A real-world football dataset is used to show the utility of mVis for a series of analysis and labellingtasks, from initial labelling through iterations of data exploration, clustering, classification, and activelearning to refine the named partitions, to finally producing a high-quality labelled training datasetsuitable for training a classifier. The tool empowers the analyst with interactive visualisations includingscatterplots, parallel coordinates, similarity maps for records, and a new similarity map for partitions.

Freie Schlagworte: Labeling, Clustering, Active learning, Multivariate data, Visualization
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Graphisch-Interaktive Systeme
Hinterlegungsdatum: 26 Aug 2020 13:06
Letzte Änderung: 26 Aug 2020 13:06
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen