TU Darmstadt / ULB / TUbiblio

Efficient Multilabel Classification Algorithms for Large-Scale Problems in the Legal Domain

Loza Mencía, Eneldo and Fürnkranz, Johannes Francesconi, Enrico and Montemagni, Simonetta and Peters, Wim and Tiscornia, Daniela (eds.) (2010):
Efficient Multilabel Classification Algorithms for Large-Scale Problems in the Legal Domain.
In: Semantic Processing of Legal Texts -- Where the Language of Law Meets the Law of Language, Springer-Verlag, pp. 192-215, [Online-Edition: http://www.ke.tu-darmstadt.de/publications/papers/loza10eurl...],
[Book Section]

Abstract

In this paper we apply multilabel classification algorithms to the EUR-Lex database of legal documents of the European Union. For this document collection, we studied three different multilabel classification problems, the largest being the categorization into the EUROVOC concept hierarchy with almost 4000 classes. We evaluated three algorithms: (i) the binary relevance approach which independently trains one classifier per label; (ii) the multiclass multilabel perceptron algorithm, which respects dependencies between the base classifiers; and (iii) the multilabel pairwise perceptron algorithm, which trains one classifier for each pair of labels. All algorithms use the simple but very efficient perceptron algorithm as the underlying classifier, which makes them very suitable for large-scale multilabel classification problems. The main challenge we had to face was that the almost 8,000,000 perceptrons that had to be trained in the pairwise setting could no longer be stored in memory. We solve this problem by resorting to the dual representation of the perceptron, which makes the pairwise approach feasible for problems of this size. The results on the EUR-Lex database confirm the good predictive performance of the pairwise approach and demonstrates the feasibility of this approach for large-scale tasks.

Item Type: Book Section
Erschienen: 2010
Editors: Francesconi, Enrico and Montemagni, Simonetta and Peters, Wim and Tiscornia, Daniela
Creators: Loza Mencía, Eneldo and Fürnkranz, Johannes
Title: Efficient Multilabel Classification Algorithms for Large-Scale Problems in the Legal Domain
Language: English
Abstract:

In this paper we apply multilabel classification algorithms to the EUR-Lex database of legal documents of the European Union. For this document collection, we studied three different multilabel classification problems, the largest being the categorization into the EUROVOC concept hierarchy with almost 4000 classes. We evaluated three algorithms: (i) the binary relevance approach which independently trains one classifier per label; (ii) the multiclass multilabel perceptron algorithm, which respects dependencies between the base classifiers; and (iii) the multilabel pairwise perceptron algorithm, which trains one classifier for each pair of labels. All algorithms use the simple but very efficient perceptron algorithm as the underlying classifier, which makes them very suitable for large-scale multilabel classification problems. The main challenge we had to face was that the almost 8,000,000 perceptrons that had to be trained in the pairwise setting could no longer be stored in memory. We solve this problem by resorting to the dual representation of the perceptron, which makes the pairwise approach feasible for problems of this size. The results on the EUR-Lex database confirm the good predictive performance of the pairwise approach and demonstrates the feasibility of this approach for large-scale tasks.

Title of Book: Semantic Processing of Legal Texts -- Where the Language of Law Meets the Law of Language
Volume: 6036
Publisher: Springer-Verlag
ISBN: 978-3-642-12836-3
Uncontrolled Keywords: EUR-Lex Database, learning by pairwise comparison, Legal Documents, multilabel classification, Text Classification
Divisions: 20 Department of Computer Science > Knowl­edge En­gi­neer­ing
20 Department of Computer Science
Date Deposited: 24 Jun 2011 14:22
Official URL: http://www.ke.tu-darmstadt.de/publications/papers/loza10eurl...
Identification Number: doi:10.1007/978-3-642-12837-0_11
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)

View Item View Item