TY - GEN
T1 - Towards generating stylized image captions via adversarial training
AU - Mohamad Nezami, Omid
AU - Dras, Mark
AU - Wan, Stephen
AU - Paris, Cécile
AU - Hamey, Len
PY - 2019/1/1
Y1 - 2019/1/1
N2 - While most image captioning aims to generate objective descriptions of images, the last few years have seen work on generating visually grounded image captions which have a specific style (e.g., incorporating positive or negative sentiment). However, because the stylistic component is typically the last part of training, current models usually pay more attention to the style at the expense of accurate content description. In addition, there is a lack of variability in terms of the stylistic aspects. To address these issues, we propose an image captioning model called ATTEND-GAN which has two core components: first, an attention-based caption generator to strongly correlate different parts of an image with different parts of a caption; and second, an adversarial training mechanism to assist the caption generator to add diverse stylistic components to the generated captions. Because of these components, ATTEND-GAN can generate correlated captions as well as more human-like variability of stylistic patterns. Our system outperforms the state-of-the-art as well as a collection of our baseline models. A linguistic analysis of the generated captions demonstrates that captions generated using ATTEND-GAN have a wider range of stylistic adjectives and adjective-noun pairs.
AB - While most image captioning aims to generate objective descriptions of images, the last few years have seen work on generating visually grounded image captions which have a specific style (e.g., incorporating positive or negative sentiment). However, because the stylistic component is typically the last part of training, current models usually pay more attention to the style at the expense of accurate content description. In addition, there is a lack of variability in terms of the stylistic aspects. To address these issues, we propose an image captioning model called ATTEND-GAN which has two core components: first, an attention-based caption generator to strongly correlate different parts of an image with different parts of a caption; and second, an adversarial training mechanism to assist the caption generator to add diverse stylistic components to the generated captions. Because of these components, ATTEND-GAN can generate correlated captions as well as more human-like variability of stylistic patterns. Our system outperforms the state-of-the-art as well as a collection of our baseline models. A linguistic analysis of the generated captions demonstrates that captions generated using ATTEND-GAN have a wider range of stylistic adjectives and adjective-noun pairs.
KW - Adversarial training
KW - Attention mechanism
KW - Image captioning
UR - http://www.scopus.com/inward/record.url?scp=85072865368&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-29908-8_22
DO - 10.1007/978-3-030-29908-8_22
M3 - Conference proceeding contribution
AN - SCOPUS:85072865368
SN - 9783030299071
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 270
EP - 284
BT - PRICAI 2019
A2 - Nayak, Abhaya C.
A2 - Sharma, Alok
PB - Springer-VDI-Verlag GmbH & Co. KG
CY - Switzerland
T2 - 16th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2019
Y2 - 26 August 2019 through 30 August 2019
ER -