PromDA: Prompt-based Data Augmentation for low-resource NLU tasks

Yufei Wang, Can Xu, Qingfeng Sun, Huang Hu, Chongyang Tao, Xiubo Geng, Daxin Jiang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

52 Citations (Scopus)
71 Downloads (Pure)

Abstract

This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based Data Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The NLU models can be further improved when they are combined for training.

Original languageEnglish
Title of host publicationThe 60th Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationProceedings of the Conference, Vol. 1 (Long Papers)
EditorsSmaranda Muresan, Preslav Nakov, Aline Villavicencio
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages4242-4255
Number of pages14
ISBN (Electronic)9781955917216
DOIs
Publication statusPublished - 2022
Event60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, Ireland
Duration: 22 May 202227 May 2022

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Conference

Conference60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
Country/TerritoryIreland
CityDublin
Period22/05/2227/05/22

Bibliographical note

Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Fingerprint

Dive into the research topics of 'PromDA: Prompt-based Data Augmentation for low-resource NLU tasks'. Together they form a unique fingerprint.

Cite this