Automatic, meta and human evaluation for multimodal summarization with multimodal output

Haojie Zhuang, Wei Emma Zhang, Leon Xie, Weitong Chen, Jian Yang, Quan Z. Sheng

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

2 Citations (Scopus)

Abstract

Multimodal summarization with multimodal output (MSMO) has attracted increasing research interests recently as multimodal summary could provide more comprehensive information compared to text-only summary, effectively improving the user experience and satisfaction. As one of the most fundamental components for the development of MSMO, evaluation is an emerging yet underexplored research topic. In this paper, we fill this gap and propose a research framework that studies three research questions of MSMO evaluation: (1) Automatic Evaluation: We propose a novel metric mLLM-EVAL, which utilizes multimodal Large Language Model for MSMO EVALuation. (2) Meta-Evaluation: We create a meta-evaluation benchmark dataset by collecting human-annotated scores for multimodal summaries. With our benchmark, we conduct meta-evaluation analysis to assess the quality of different evaluation metrics and show the effectiveness of our proposed mLLM-EVAL. (3) Human Evaluation: To provide more objective and unbiased human annotations for meta-evaluation, we hypothesize and verify three types of cognitive biases in human evaluation. We also incorporate our findings into the human annotation process in the meta-evaluation benchmark. Overall, our research framework provides an evaluation metric, a meta-evaluation benchmark dataset annotated by humans and an analysis of cognitive biases in human evaluation, which we believe would serve as a valuable and comprehensive resource for the MSMO research community.

Original languageEnglish
Title of host publicationProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies (Volume 1: long papers)
Place of PublicationKerrville, TX
PublisherAssociation for Computational Linguistics (ACL)
Pages7768-7790
Number of pages23
ISBN (Electronic)9798891761148
DOIs
Publication statusPublished - 2024
Event2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 - Hybrid, Mexico City, Mexico
Duration: 16 Jun 202421 Jun 2024

Conference

Conference2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024
Country/TerritoryMexico
CityHybrid, Mexico City
Period16/06/2421/06/24

Fingerprint

Dive into the research topics of 'Automatic, meta and human evaluation for multimodal summarization with multimodal output'. Together they form a unique fingerprint.

Cite this