TU Darmstadt / ULB / TUbiblio

Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain

Loza Mencía, Eneldo ; Fürnkranz, Johannes
Hrsg.: Daelemans, Walter ; Goethals, Bart ; Morik, Katharina (2008)
Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain.
doi: 10.1007/978-3-540-87481-2_4
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

In this paper we applied multilabel classification algorithms to the EUR-Lex database of legal documents of the European Union. On this document collection, we studied three different multilabel classification problems, the largest being the categorization into the EUROVOC concept hierarchy with almost 4000 classes. We evaluated three algorithms: (i) the binary relevance approach which independently trains one classifier per label; (ii) the multiclass multilabel perceptron algorithm, which respects dependencies between the base classifiers; and (iii) the multilabel pairwise perceptron algorithm, which trains one classifier for each pair of labels. All algorithms use the simple but very efficient perceptron algorithm as the underlying classifier, which makes them very suitable for large-scale multilabel classification problems. The main challenge we had to face was that the almost 8,000,000 perceptrons that had to be trained in the pairwise setting could no longer be stored in memory. We solve this problem by resorting to the dual representation of the perceptron, which makes the pairwise approach feasible for problems of this size. The results on the EUR-Lex database confirm the good predictive performance of the pairwise approach and demonstrates the feasibility of this approach for large-scale tasks.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2008
Herausgeber: Daelemans, Walter ; Goethals, Bart ; Morik, Katharina
Autor(en): Loza Mencía, Eneldo ; Fürnkranz, Johannes
Art des Eintrags: Bibliographie
Titel: Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain
Sprache: Englisch
Publikationsjahr: 2008
Verlag: Springer
Buchtitel: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Disocvery in Databases (ECML-PKDD-2008), Part II
Band einer Reihe: 5212
DOI: 10.1007/978-3-540-87481-2_4
Kurzbeschreibung (Abstract):

In this paper we applied multilabel classification algorithms to the EUR-Lex database of legal documents of the European Union. On this document collection, we studied three different multilabel classification problems, the largest being the categorization into the EUROVOC concept hierarchy with almost 4000 classes. We evaluated three algorithms: (i) the binary relevance approach which independently trains one classifier per label; (ii) the multiclass multilabel perceptron algorithm, which respects dependencies between the base classifiers; and (iii) the multilabel pairwise perceptron algorithm, which trains one classifier for each pair of labels. All algorithms use the simple but very efficient perceptron algorithm as the underlying classifier, which makes them very suitable for large-scale multilabel classification problems. The main challenge we had to face was that the almost 8,000,000 perceptrons that had to be trained in the pairwise setting could no longer be stored in memory. We solve this problem by resorting to the dual representation of the perceptron, which makes the pairwise approach feasible for problems of this size. The results on the EUR-Lex database confirm the good predictive performance of the pairwise approach and demonstrates the feasibility of this approach for large-scale tasks.

Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Knowledge Engineering
Hinterlegungsdatum: 24 Jun 2011 15:09
Letzte Änderung: 03 Jun 2018 21:24
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen