TU Darmstadt / ULB / TUbiblio

Prior Art Search using International Patent Classification Codes and All-Claims-Queries

Szarvas, György and Herbert, Benjamin and Gurevych, Iryna (2009):
Prior Art Search using International Patent Classification Codes and All-Claims-Queries.
In: Working Notes of the 10th Workshop of the Cross Language Evaluation Forum (CLEF), Corfu, Greece, [Online-Edition: https://link.springer.com/chapter/10.1007/978-3-642-15754-7_...],
[Conference or Workshop Item]

Abstract

In this study, we describe our system at the Intellectual Property track of the 2009 Cross-Language Evaluation Forum campaign (CLEF-IP). Task: The CLEF-IP challenge addressed prior art search for patent applications. Prior art is understood as information that renders the current application's novelty claims invalid, and thus might hinder the patentability of the application. Different subtasks allowed query formulation in different languages (German, French, English), while the main task allowed the construction of queries in all three languages. The same multilingual document collection was used for each task, consisting of about 1 million patents. Objectives: Our main objective was to evaluate the benefit of incorporating manually assigned codes corresponding to categories of the International Patent Classification (IPC) taxonomy in the retrieval process. These codes describe the topics relevant to the invention in terms of the IPC category labels. Approach: We used the Apache Lucene IR library to conduct experiments withthe traditional TF-IDF-based ranking approach, indexing both the textual content of each patent and the IPC codes assigned to each document. We formulated our queries by using all claims and the title of a patent application in order to measure the (weighted) lexical overlap between topics and prior art candidates. We also formulated a language-independent query using the IPC codes of a document to improve the coverage and to obtain a more accurate ranking of candidates. Additionally, we used the IPC taxonomy (the categories and their short descriptive texts) to perform concept based query expansion (Qui and Frey, 1993) for measuring the semantic overlap between topics and prior art candidates and tried to incorporate this information to our system's ranking process. Resources used: We used the patent's textual content, the patent's IPC codes and the IPC taxonomy (as provided by the World Intellectual Property Organization). Results: Probably due to an insufficient length of definition texts in the IPC taxonomy (used to define the concept mapping of our model), incorporating the concept based similarity measure did not improve our performance and was thus excluded from the final submission. Using the extended boolean vector space model as implemented by Lucene, our system remained efficient (3 seconds needed to process 1 topic) and still yielded fair performance: it achieved the 6th best Mean Average Precision score out of 14 participating systems on 500 topics, and the 4th best score out of 9 participants in the large scale evaluation (with 10.000 topics).

Item Type: Conference or Workshop Item
Erschienen: 2009
Creators: Szarvas, György and Herbert, Benjamin and Gurevych, Iryna
Title: Prior Art Search using International Patent Classification Codes and All-Claims-Queries
Language: English
Abstract:

In this study, we describe our system at the Intellectual Property track of the 2009 Cross-Language Evaluation Forum campaign (CLEF-IP). Task: The CLEF-IP challenge addressed prior art search for patent applications. Prior art is understood as information that renders the current application's novelty claims invalid, and thus might hinder the patentability of the application. Different subtasks allowed query formulation in different languages (German, French, English), while the main task allowed the construction of queries in all three languages. The same multilingual document collection was used for each task, consisting of about 1 million patents. Objectives: Our main objective was to evaluate the benefit of incorporating manually assigned codes corresponding to categories of the International Patent Classification (IPC) taxonomy in the retrieval process. These codes describe the topics relevant to the invention in terms of the IPC category labels. Approach: We used the Apache Lucene IR library to conduct experiments withthe traditional TF-IDF-based ranking approach, indexing both the textual content of each patent and the IPC codes assigned to each document. We formulated our queries by using all claims and the title of a patent application in order to measure the (weighted) lexical overlap between topics and prior art candidates. We also formulated a language-independent query using the IPC codes of a document to improve the coverage and to obtain a more accurate ranking of candidates. Additionally, we used the IPC taxonomy (the categories and their short descriptive texts) to perform concept based query expansion (Qui and Frey, 1993) for measuring the semantic overlap between topics and prior art candidates and tried to incorporate this information to our system's ranking process. Resources used: We used the patent's textual content, the patent's IPC codes and the IPC taxonomy (as provided by the World Intellectual Property Organization). Results: Probably due to an insufficient length of definition texts in the IPC taxonomy (used to define the concept mapping of our model), incorporating the concept based similarity measure did not improve our performance and was thus excluded from the final submission. Using the extended boolean vector space model as implemented by Lucene, our system remained efficient (3 seconds needed to process 1 topic) and still yielded fair performance: it achieved the 6th best Mean Average Precision score out of 14 participating systems on 500 topics, and the 4th best score out of 9 participants in the large scale evaluation (with 10.000 topics).

Title of Book: Working Notes of the 10th Workshop of the Cross Language Evaluation Forum (CLEF)
Uncontrolled Keywords: Semantic Information Management;UKP_a_SIM;UKP_p_SIGMUND;Patent Information Retrieval, Invalidity Search
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Ubiquitous Knowledge Processing
Event Location: Corfu, Greece
Date Deposited: 31 Dec 2016 14:29
Official URL: https://link.springer.com/chapter/10.1007/978-3-642-15754-7_...
Identification Number: TUD-CS-2009-0146
Related URLs:
Export:
Suche nach Titel in: TUfind oder in Google

Optionen (nur für Redakteure)

View Item View Item