Guided open vocabulary image captioning with constrained beam search

Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

134 Citations (Scopus)
216 Downloads (Pure)

Abstract

Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We address this problem using a flexible approach that enables existing deep captioning architectures to take advantage of image taggers at test time, without re-training. Our method uses constrained beam search to force the inclusion of selected tag words in the output, and fixed, pretrained word embeddings to facilitate vocabulary expansion to previously unseen tag words. Using this approach we achieve state of the art results for out-of-domain captioning on MSCOCO (and improved results for in-domain captioning). Perhaps surprisingly, our results significantly outperform approaches that incorporate the same tag predictions into the learning algorithm. We also show that we can significantly improve the quality of generated ImageNet captions by leveraging ground-truth labels.

Original languageEnglish
Title of host publicationProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Subtitle of host publicationEMNLP 2017
EditorsMartha Palmer, Rebecca Hwa, Sebastian Riedel
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics
Pages936-945
Number of pages10
ISBN (Electronic)9781945626838
DOIs
Publication statusPublished - 2017
EventConference on Empirical Methods in Natural Language Processing (2017) - Copenhagen, Denmark
Duration: 9 Sept 201711 Sept 2017

Conference

ConferenceConference on Empirical Methods in Natural Language Processing (2017)
Abbreviated titleEMNLP 2017
Country/TerritoryDenmark
CityCopenhagen
Period9/09/1711/09/17

Bibliographical note

Copyright the Publisher 2017. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Fingerprint

Dive into the research topics of 'Guided open vocabulary image captioning with constrained beam search'. Together they form a unique fingerprint.

Cite this