Face-Cap: image captioning using facial expression analysis

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

Abstract

Image captioning is the process of generating a natural language description of an image. Most current image captioning models, however, do not take into account the emotional aspect of an image, which is very relevant to activities and interpersonal relationships represented therein. Towards developing a model that can produce human-like captions incorporating these, we use facial expression features extracted from images including human faces, with the aim of improving the descriptive ability of the model. In this work, we present two variants of our Face-Cap model, which embed facial expression features in different ways, to generate image captions. Using all standard evaluation metrics, our Face-Cap models outperform a state-of-the-art baseline model for generating image captions when applied to an image caption dataset extracted from the standard Flickr 30 K dataset, consisting of around 11 K images containing faces. An analysis of the captions finds that, perhaps surprisingly, the improvement in caption quality appears to come not from the addition of adjectives linked to emotional aspects of the images, but from more variety in the actions described in the captions. Code related to this paper is available at: https://github.com/omidmn/Face-Cap.

LanguageEnglish
Title of host publicationMachine Learning and Principles and Practice of Knowledge Discovery in Databases
Subtitle of host publicationEuropean Conference, ECML-PKDD 2018. Proceedings, Part I
EditorsMichele Berlingerio, Francesco Bonchi, Thomas Gärtner, Neil Hurley, Georgiana Ifrim
Place of PublicationCham, Switzerland
PublisherSpringer, Springer Nature
Pages226-240
Number of pages15
ISBN (Electronic)9783030109257
ISBN (Print)9783030109240
DOIs
Publication statusPublished - 2019
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018 - Dublin, Ireland
Duration: 10 Sep 201814 Sep 2018

Publication series

NameLecture Notes in Artificial Intelligence
PublisherSpringer
Volume11051
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018
CountryIreland
CityDublin
Period10/09/1814/09/18

Fingerprint

Facial Expression
Face
Cap
Image Model
Model
Natural Language
Baseline
Metric
Evaluation

Keywords

  • Image captioning
  • Facial expression recognition
  • Sentiment analysis
  • Deep learning

Cite this

Mohamad Nezami, O., Dras, M., Anderson, P., & Hamey, L. (2019). Face-Cap: image captioning using facial expression analysis. In M. Berlingerio, F. Bonchi, T. Gärtner, N. Hurley, & G. Ifrim (Eds.), Machine Learning and Principles and Practice of Knowledge Discovery in Databases: European Conference, ECML-PKDD 2018. Proceedings, Part I (pp. 226-240). (Lecture Notes in Artificial Intelligence; Vol. 11051). Cham, Switzerland: Springer, Springer Nature. https://doi.org/10.1007/978-3-030-10925-7_14, https://doi.org/10.1007/978-3-030-10925-7_14
Mohamad Nezami, Omid ; Dras, Mark ; Anderson, Peter ; Hamey, Leonard. / Face-Cap : image captioning using facial expression analysis. Machine Learning and Principles and Practice of Knowledge Discovery in Databases: European Conference, ECML-PKDD 2018. Proceedings, Part I. editor / Michele Berlingerio ; Francesco Bonchi ; Thomas Gärtner ; Neil Hurley ; Georgiana Ifrim. Cham, Switzerland : Springer, Springer Nature, 2019. pp. 226-240 (Lecture Notes in Artificial Intelligence).
@inproceedings{50a5a49d52d7443d9c5de836e6b5d7b4,
title = "Face-Cap: image captioning using facial expression analysis",
abstract = "Image captioning is the process of generating a natural language description of an image. Most current image captioning models, however, do not take into account the emotional aspect of an image, which is very relevant to activities and interpersonal relationships represented therein. Towards developing a model that can produce human-like captions incorporating these, we use facial expression features extracted from images including human faces, with the aim of improving the descriptive ability of the model. In this work, we present two variants of our Face-Cap model, which embed facial expression features in different ways, to generate image captions. Using all standard evaluation metrics, our Face-Cap models outperform a state-of-the-art baseline model for generating image captions when applied to an image caption dataset extracted from the standard Flickr 30 K dataset, consisting of around 11 K images containing faces. An analysis of the captions finds that, perhaps surprisingly, the improvement in caption quality appears to come not from the addition of adjectives linked to emotional aspects of the images, but from more variety in the actions described in the captions. Code related to this paper is available at: https://github.com/omidmn/Face-Cap.",
keywords = "Image captioning, Facial expression recognition, Sentiment analysis, Deep learning",
author = "{Mohamad Nezami}, Omid and Mark Dras and Peter Anderson and Leonard Hamey",
year = "2019",
doi = "10.1007/978-3-030-10925-7_14",
language = "English",
isbn = "9783030109240",
series = "Lecture Notes in Artificial Intelligence",
publisher = "Springer, Springer Nature",
pages = "226--240",
editor = "Michele Berlingerio and Francesco Bonchi and Thomas G{\"a}rtner and Neil Hurley and Georgiana Ifrim",
booktitle = "Machine Learning and Principles and Practice of Knowledge Discovery in Databases",
address = "United States",

}

Mohamad Nezami, O, Dras, M, Anderson, P & Hamey, L 2019, Face-Cap: image captioning using facial expression analysis. in M Berlingerio, F Bonchi, T Gärtner, N Hurley & G Ifrim (eds), Machine Learning and Principles and Practice of Knowledge Discovery in Databases: European Conference, ECML-PKDD 2018. Proceedings, Part I. Lecture Notes in Artificial Intelligence, vol. 11051, Springer, Springer Nature, Cham, Switzerland, pp. 226-240, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018, Dublin, Ireland, 10/09/18. https://doi.org/10.1007/978-3-030-10925-7_14, https://doi.org/10.1007/978-3-030-10925-7_14

Face-Cap : image captioning using facial expression analysis. / Mohamad Nezami, Omid; Dras, Mark; Anderson, Peter; Hamey, Leonard.

Machine Learning and Principles and Practice of Knowledge Discovery in Databases: European Conference, ECML-PKDD 2018. Proceedings, Part I. ed. / Michele Berlingerio; Francesco Bonchi; Thomas Gärtner; Neil Hurley; Georgiana Ifrim. Cham, Switzerland : Springer, Springer Nature, 2019. p. 226-240 (Lecture Notes in Artificial Intelligence; Vol. 11051).

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

TY - GEN

T1 - Face-Cap

T2 - image captioning using facial expression analysis

AU - Mohamad Nezami, Omid

AU - Dras, Mark

AU - Anderson, Peter

AU - Hamey, Leonard

PY - 2019

Y1 - 2019

N2 - Image captioning is the process of generating a natural language description of an image. Most current image captioning models, however, do not take into account the emotional aspect of an image, which is very relevant to activities and interpersonal relationships represented therein. Towards developing a model that can produce human-like captions incorporating these, we use facial expression features extracted from images including human faces, with the aim of improving the descriptive ability of the model. In this work, we present two variants of our Face-Cap model, which embed facial expression features in different ways, to generate image captions. Using all standard evaluation metrics, our Face-Cap models outperform a state-of-the-art baseline model for generating image captions when applied to an image caption dataset extracted from the standard Flickr 30 K dataset, consisting of around 11 K images containing faces. An analysis of the captions finds that, perhaps surprisingly, the improvement in caption quality appears to come not from the addition of adjectives linked to emotional aspects of the images, but from more variety in the actions described in the captions. Code related to this paper is available at: https://github.com/omidmn/Face-Cap.

AB - Image captioning is the process of generating a natural language description of an image. Most current image captioning models, however, do not take into account the emotional aspect of an image, which is very relevant to activities and interpersonal relationships represented therein. Towards developing a model that can produce human-like captions incorporating these, we use facial expression features extracted from images including human faces, with the aim of improving the descriptive ability of the model. In this work, we present two variants of our Face-Cap model, which embed facial expression features in different ways, to generate image captions. Using all standard evaluation metrics, our Face-Cap models outperform a state-of-the-art baseline model for generating image captions when applied to an image caption dataset extracted from the standard Flickr 30 K dataset, consisting of around 11 K images containing faces. An analysis of the captions finds that, perhaps surprisingly, the improvement in caption quality appears to come not from the addition of adjectives linked to emotional aspects of the images, but from more variety in the actions described in the captions. Code related to this paper is available at: https://github.com/omidmn/Face-Cap.

KW - Image captioning

KW - Facial expression recognition

KW - Sentiment analysis

KW - Deep learning

UR - http://www.scopus.com/inward/record.url?scp=85061138662&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-10925-7_14

DO - 10.1007/978-3-030-10925-7_14

M3 - Conference proceeding contribution

SN - 9783030109240

T3 - Lecture Notes in Artificial Intelligence

SP - 226

EP - 240

BT - Machine Learning and Principles and Practice of Knowledge Discovery in Databases

A2 - Berlingerio, Michele

A2 - Bonchi, Francesco

A2 - Gärtner, Thomas

A2 - Hurley, Neil

A2 - Ifrim, Georgiana

PB - Springer, Springer Nature

CY - Cham, Switzerland

ER -

Mohamad Nezami O, Dras M, Anderson P, Hamey L. Face-Cap: image captioning using facial expression analysis. In Berlingerio M, Bonchi F, Gärtner T, Hurley N, Ifrim G, editors, Machine Learning and Principles and Practice of Knowledge Discovery in Databases: European Conference, ECML-PKDD 2018. Proceedings, Part I. Cham, Switzerland: Springer, Springer Nature. 2019. p. 226-240. (Lecture Notes in Artificial Intelligence). https://doi.org/10.1007/978-3-030-10925-7_14, https://doi.org/10.1007/978-3-030-10925-7_14