TY - GEN
T1 - Partially-supervised image captioning
AU - Anderson, Peter
AU - Gould, Stephen
AU - Johnson, Mark
PY - 2018
Y1 - 2018
N2 - Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a much larger number and variety of visual concepts must be understood. To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets. Since image labels and object classes can be interpreted as partial captions, we formulate this problem as learning from partially specified sequence data. We then propose a novel algorithm for training sequence models, such as recurrent neural networks, on partially-specified sequences which we represent using finite state automata. In the context of image captioning, our method lifts the restriction that previously required image captioning models to be trained on paired image-sentence corpora only, or otherwise required specialized model architectures to take advantage of alternative data modalities. Applying our approach to an existing neural captioning model, we achieve state of the art results on the novel object captioning task using the COCO dataset. We further show that we can train a captioning model to describe new visual concepts from the Open Images dataset while maintaining competitive COCO evaluation scores.
AB - Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a much larger number and variety of visual concepts must be understood. To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets. Since image labels and object classes can be interpreted as partial captions, we formulate this problem as learning from partially specified sequence data. We then propose a novel algorithm for training sequence models, such as recurrent neural networks, on partially-specified sequences which we represent using finite state automata. In the context of image captioning, our method lifts the restriction that previously required image captioning models to be trained on paired image-sentence corpora only, or otherwise required specialized model architectures to take advantage of alternative data modalities. Applying our approach to an existing neural captioning model, we achieve state of the art results on the novel object captioning task using the COCO dataset. We further show that we can train a captioning model to describe new visual concepts from the Open Images dataset while maintaining competitive COCO evaluation scores.
UR - https://papers.nips.cc/book/advances-in-neural-information-processing-systems-31-2018
UR - http://www.scopus.com/inward/record.url?scp=85064841019&partnerID=8YFLogxK
M3 - Conference proceeding contribution
T3 - Advances in Neural Information Processing Systems
SP - 1
EP - 12
BT - Advances in Neural Information Processing Systems 31 (NIPS 2018)
A2 - Bengio, S.
A2 - Wallach, H.
A2 - Larochelle, H.
A2 - Grauman, K.
A2 - Cesa-Bianchi, N.
A2 - Garnett, R.
PB - Neural Information Processing Systems (NIPS) Foundation
CY - San Diego
T2 - 32nd Conference on Neural Information Processing Systems (NIPS)
Y2 - 2 December 2018 through 8 December 2018
ER -