TU Darmstadt / ULB / TUbiblio

Segmentation of legal documents

Loza Mencía, Eneldo (2009)
Segmentation of legal documents.
doi: 10.1145/1568234.1568245
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

An overwhelming number of legal documents is available in digital form. However, most of the texts are usually only provided in a semi-structured form, i.e. the documents are structured only implicitly using text formatting and alignment. In this form the documents are perfectly understandable by a human, but not by a machine. This is an obstacle towards advanced intelligent legal information retrieval and knowledge systems. The reason for this lack of structured knowledge is that the conversion of texts in conventional form into a structured, machine-readable form, a process called segmentation, is frequently done manually and is therefore very expensive. We introduce a trainable system based on state-of-the-art Information Extraction techniques for the automatic segmentation of legal documents. Our system makes special use of the implicitly given structure in the source digital file as well as of the explicit knowledge about the target structure. Our evaluation on the French IPR Law demonstrates that the system is able to learn an effective segmenter given only a few manually processed training documents. In some cases, even only one seen example is sufficient in order to correctly process the remaining documents.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2009
Autor(en): Loza Mencía, Eneldo
Art des Eintrags: Bibliographie
Titel: Segmentation of legal documents
Sprache: Englisch
Publikationsjahr: 2009
Verlag: ACM
Buchtitel: Proceedings of the 12th International Conference on Artificial Intelligence and Law
DOI: 10.1145/1568234.1568245
Kurzbeschreibung (Abstract):

An overwhelming number of legal documents is available in digital form. However, most of the texts are usually only provided in a semi-structured form, i.e. the documents are structured only implicitly using text formatting and alignment. In this form the documents are perfectly understandable by a human, but not by a machine. This is an obstacle towards advanced intelligent legal information retrieval and knowledge systems. The reason for this lack of structured knowledge is that the conversion of texts in conventional form into a structured, machine-readable form, a process called segmentation, is frequently done manually and is therefore very expensive. We introduce a trainable system based on state-of-the-art Information Extraction techniques for the automatic segmentation of legal documents. Our system makes special use of the implicitly given structure in the source digital file as well as of the explicit knowledge about the target structure. Our evaluation on the French IPR Law demonstrates that the system is able to learn an effective segmenter given only a few manually processed training documents. In some cases, even only one seen example is sufficient in order to correctly process the remaining documents.

Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik > Knowledge Engineering
20 Fachbereich Informatik
Hinterlegungsdatum: 24 Jun 2011 14:46
Letzte Änderung: 05 Mär 2013 09:49
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen