Szarvas, György ; Herbert, Benjamin ; Gurevych, Iryna (2009)
Prior Art Search using International Patent Classification Codes and All-Claims-Queries.
Corfu, Greece
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
In this study, we describe our system at the Intellectual Property track of the 2009 Cross-Language Evaluation Forum campaign (CLEF-IP). Task: The CLEF-IP challenge addressed prior art search for patent applications. Prior art is understood as information that renders the current application's novelty claims invalid, and thus might hinder the patentability of the application. Different subtasks allowed query formulation in different languages (German, French, English), while the main task allowed the construction of queries in all three languages. The same multilingual document collection was used for each task, consisting of about 1 million patents. Objectives: Our main objective was to evaluate the benefit of incorporating manually assigned codes corresponding to categories of the International Patent Classification (IPC) taxonomy in the retrieval process. These codes describe the topics relevant to the invention in terms of the IPC category labels. Approach: We used the Apache Lucene IR library to conduct experiments withthe traditional TF-IDF-based ranking approach, indexing both the textual content of each patent and the IPC codes assigned to each document. We formulated our queries by using all claims and the title of a patent application in order to measure the (weighted) lexical overlap between topics and prior art candidates. We also formulated a language-independent query using the IPC codes of a document to improve the coverage and to obtain a more accurate ranking of candidates. Additionally, we used the IPC taxonomy (the categories and their short descriptive texts) to perform concept based query expansion (Qui and Frey, 1993) for measuring the semantic overlap between topics and prior art candidates and tried to incorporate this information to our system's ranking process. Resources used: We used the patent's textual content, the patent's IPC codes and the IPC taxonomy (as provided by the World Intellectual Property Organization). Results: Probably due to an insufficient length of definition texts in the IPC taxonomy (used to define the concept mapping of our model), incorporating the concept based similarity measure did not improve our performance and was thus excluded from the final submission. Using the extended boolean vector space model as implemented by Lucene, our system remained efficient (3 seconds needed to process 1 topic) and still yielded fair performance: it achieved the 6th best Mean Average Precision score out of 14 participating systems on 500 topics, and the 4th best score out of 9 participants in the large scale evaluation (with 10.000 topics).
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2009 |
Autor(en): | Szarvas, György ; Herbert, Benjamin ; Gurevych, Iryna |
Art des Eintrags: | Bibliographie |
Titel: | Prior Art Search using International Patent Classification Codes and All-Claims-Queries |
Sprache: | Englisch |
Publikationsjahr: | August 2009 |
Buchtitel: | Working Notes of the 10th Workshop of the Cross Language Evaluation Forum (CLEF) |
Veranstaltungsort: | Corfu, Greece |
URL / URN: | https://link.springer.com/chapter/10.1007/978-3-642-15754-7_... |
Zugehörige Links: | |
Kurzbeschreibung (Abstract): | In this study, we describe our system at the Intellectual Property track of the 2009 Cross-Language Evaluation Forum campaign (CLEF-IP). Task: The CLEF-IP challenge addressed prior art search for patent applications. Prior art is understood as information that renders the current application's novelty claims invalid, and thus might hinder the patentability of the application. Different subtasks allowed query formulation in different languages (German, French, English), while the main task allowed the construction of queries in all three languages. The same multilingual document collection was used for each task, consisting of about 1 million patents. Objectives: Our main objective was to evaluate the benefit of incorporating manually assigned codes corresponding to categories of the International Patent Classification (IPC) taxonomy in the retrieval process. These codes describe the topics relevant to the invention in terms of the IPC category labels. Approach: We used the Apache Lucene IR library to conduct experiments withthe traditional TF-IDF-based ranking approach, indexing both the textual content of each patent and the IPC codes assigned to each document. We formulated our queries by using all claims and the title of a patent application in order to measure the (weighted) lexical overlap between topics and prior art candidates. We also formulated a language-independent query using the IPC codes of a document to improve the coverage and to obtain a more accurate ranking of candidates. Additionally, we used the IPC taxonomy (the categories and their short descriptive texts) to perform concept based query expansion (Qui and Frey, 1993) for measuring the semantic overlap between topics and prior art candidates and tried to incorporate this information to our system's ranking process. Resources used: We used the patent's textual content, the patent's IPC codes and the IPC taxonomy (as provided by the World Intellectual Property Organization). Results: Probably due to an insufficient length of definition texts in the IPC taxonomy (used to define the concept mapping of our model), incorporating the concept based similarity measure did not improve our performance and was thus excluded from the final submission. Using the extended boolean vector space model as implemented by Lucene, our system remained efficient (3 seconds needed to process 1 topic) and still yielded fair performance: it achieved the 6th best Mean Average Precision score out of 14 participating systems on 500 topics, and the 4th best score out of 9 participants in the large scale evaluation (with 10.000 topics). |
Freie Schlagworte: | Semantic Information Management;UKP_a_SIM;UKP_p_SIGMUND;Patent Information Retrieval, Invalidity Search |
ID-Nummer: | TUD-CS-2009-0146 |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
Hinterlegungsdatum: | 31 Dez 2016 14:29 |
Letzte Änderung: | 24 Jan 2020 12:03 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |