Skip to main navigation Skip to search Skip to main content

NAP2: a benchmark for naturalness and privacy-preserving text rewriting by learning from human

Shuo Huang, William MacLean, Xiaoxi Kang, Qiongkai Xu, Zhuang Li, Xingliang Yuan, Gholamreza Haffari, Lizhen Qu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

The widespread use of cloud-based Large Language Models (LLMs) has heightened concerns over user privacy, as sensitive information may be inadvertently exposed during interactions with these services. To protect privacy before sending sensitive data to those models, we suggest sanitizing sensitive text using two common strategies used by humans: i) deleting sensitive expressions, and ii) obscuring sensitive details by abstracting them. To explore the issues and develop a tool for text rewriting, we curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models (LLMs). Compared to the prior works on anonymization, the human-inspired approaches result in more natural rewrites and offer an improved balance between privacy protection and data utility, as demonstrated by our extensive experiments. Our dataset is available at https://github.com/shuo956/NAP2-privacyrewrite.
Original languageEnglish
Title of host publicationEMNLP 2025
Subtitle of host publicationthe 2025 Conference on Empirical Methods in Natural Language Processing : Findings of EMNLP 2025
Place of PublicationKerrville, TX
PublisherAssociation for Computational Linguistics
Pages8954-8970
Number of pages17
ISBN (Electronic)9798891763357
DOIs
Publication statusPublished - 2025
Event30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025 - Suzhou, China
Duration: 4 Nov 20259 Nov 2025

Conference

Conference30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
Country/TerritoryChina
CitySuzhou
Period4/11/259/11/25

Bibliographical note

Alternative title of the host publication: "Findings of the Association for Computational Linguistics: EMNLP 2025"

Fingerprint

Dive into the research topics of 'NAP2: a benchmark for naturalness and privacy-preserving text rewriting by learning from human'. Together they form a unique fingerprint.

Cite this