The problem of text document obfuscation is to provide an automated mechanism which is able to make accessible the content of a text document without revealing the identity of its writer. This is more challenging than it seems, because an adversary equipped with powerful machine learning mechanisms is able to identify authorship (with good accuracy) where, for example, the name of the author has been redacted. Current obfuscation methods are ad hoc and have been shown to provide weak protection against such adversaries. Differential privacy, which is able to provide strong guarantees of privacy in some domains, has been thought not to be applicable to text processing. In this paper we will review obfuscation as a quantitative information flow problem and explain how generalised differential privacy can be applied to this problem to provide strong anonymisation guarantees in a standard model for text processing.
14 Jul 2018
22nd International Symposium on Formal Methods, FM 2018 Held as Part of the Federated Logic Conference, FloC 2018