Prior art search using international patent classification codes and all-claims-queries

Benjamin Herbert*, Gyoergy Szarvas, Iryna Gurevych

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

3 Citations (Scopus)


In this paper, we describe the system we developed for the Intellectual Property track of the 2009 Cross-Language Evaluation Forum. The track addressed prior art search for patent applications. We used the Lucene library to conduct experiments with the traditional TF-IDF-based ranking approach, indexing both the textual content and the IPC codes assigned to each document. We formulated our queries by using the title and claims of a patent application in order to measure the (weighted) lexical overlap between topics and prior art candidates. We also formulated a language-independent query using the IPC codes of a document to improve the coverage and to obtain a more accurate ranking of candidates. Using a simple model, our system remained efficient and had a reasonably good performance score: it achieved the 6th best Mean Average Precision score out of 14 participating systems on 500 topics, and the 4th best score out of 9 participants on 10,000 topics.

Original languageEnglish
Title of host publicationMultilingual information access evaluation I: text retrieval experiments
EditorsC Peters, GM DiNunzio, M Kurimo, T Mandl, D Mostefa, A Penas, G Roda
Place of PublicationBerlin; Heidelberg
PublisherSpringer, Springer Nature
Number of pages8
ISBN (Print)9783642157530
Publication statusPublished - 2010
Event10th Workshop of the Cross-Language Evaluation Forum - Corfu, Greece
Duration: 30 Sep 20092 Oct 2009

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743


Conference10th Workshop of the Cross-Language Evaluation Forum

Cite this