Prior art search using international patent classification codes and all-claims-queries

Benjamin Herbert*, Gyoergy Szarvas, Iryna Gurevych

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

    5 Citations (Scopus)


    In this paper, we describe the system we developed for the Intellectual Property track of the 2009 Cross-Language Evaluation Forum. The track addressed prior art search for patent applications. We used the Lucene library to conduct experiments with the traditional TF-IDF-based ranking approach, indexing both the textual content and the IPC codes assigned to each document. We formulated our queries by using the title and claims of a patent application in order to measure the (weighted) lexical overlap between topics and prior art candidates. We also formulated a language-independent query using the IPC codes of a document to improve the coverage and to obtain a more accurate ranking of candidates. Using a simple model, our system remained efficient and had a reasonably good performance score: it achieved the 6th best Mean Average Precision score out of 14 participating systems on 500 topics, and the 4th best score out of 9 participants on 10,000 topics.

    Original languageEnglish
    Title of host publicationMultilingual information access evaluation I: text retrieval experiments
    EditorsC Peters, GM DiNunzio, M Kurimo, T Mandl, D Mostefa, A Penas, G Roda
    Place of PublicationBerlin; Heidelberg
    PublisherSpringer, Springer Nature
    Number of pages8
    ISBN (Print)9783642157530
    Publication statusPublished - 2010
    Event10th Workshop of the Cross-Language Evaluation Forum - Corfu, Greece
    Duration: 30 Sept 20092 Oct 2009

    Publication series

    NameLecture Notes in Computer Science
    ISSN (Print)0302-9743


    Conference10th Workshop of the Cross-Language Evaluation Forum


    Dive into the research topics of 'Prior art search using international patent classification codes and all-claims-queries'. Together they form a unique fingerprint.

    Cite this