Towards generating stylized image captions via adversarial training

Omid Mohamad Nezami*, Mark Dras, Stephen Wan, Cécile Paris, Len Hamey

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

1 Citation (Scopus)

Abstract

While most image captioning aims to generate objective descriptions of images, the last few years have seen work on generating visually grounded image captions which have a specific style (e.g., incorporating positive or negative sentiment). However, because the stylistic component is typically the last part of training, current models usually pay more attention to the style at the expense of accurate content description. In addition, there is a lack of variability in terms of the stylistic aspects. To address these issues, we propose an image captioning model called ATTEND-GAN which has two core components: first, an attention-based caption generator to strongly correlate different parts of an image with different parts of a caption; and second, an adversarial training mechanism to assist the caption generator to add diverse stylistic components to the generated captions. Because of these components, ATTEND-GAN can generate correlated captions as well as more human-like variability of stylistic patterns. Our system outperforms the state-of-the-art as well as a collection of our baseline models. A linguistic analysis of the generated captions demonstrates that captions generated using ATTEND-GAN have a wider range of stylistic adjectives and adjective-noun pairs.

Original languageEnglish
Title of host publicationPRICAI 2019
Subtitle of host publicationTrends in Artificial Intelligence - 16th Pacific Rim International Conference on Artificial Intelligence, Proceedings, Part I
EditorsAbhaya C. Nayak, Alok Sharma
Place of PublicationSwitzerland
PublisherSpringer-VDI-Verlag GmbH & Co. KG
Pages270-284
Number of pages15
ISBN (Electronic)9783030299088
ISBN (Print)9783030299071
DOIs
Publication statusPublished - 1 Jan 2019
Event16th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2019 - Yanuka Island, Fiji
Duration: 26 Aug 201930 Aug 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11670 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2019
CountryFiji
CityYanuka Island
Period26/08/1930/08/19

Keywords

  • Adversarial training
  • Attention mechanism
  • Image captioning

Fingerprint Dive into the research topics of 'Towards generating stylized image captions via adversarial training'. Together they form a unique fingerprint.

  • Cite this

    Mohamad Nezami, O., Dras, M., Wan, S., Paris, C., & Hamey, L. (2019). Towards generating stylized image captions via adversarial training. In A. C. Nayak, & A. Sharma (Eds.), PRICAI 2019: Trends in Artificial Intelligence - 16th Pacific Rim International Conference on Artificial Intelligence, Proceedings, Part I (pp. 270-284). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11670 LNAI). Switzerland: Springer-VDI-Verlag GmbH & Co. KG. https://doi.org/10.1007/978-3-030-29908-8_22