Automated unsupervised authorship analysis using evidence accumulation clustering

Robert Layton*, Paul Watters, Richard Dazeley

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

34 Citations (Scopus)

Abstract

Authorship Analysis aims to extract information about the authorship of documents from features within those documents. Typically, this is performed as a classification task with the aim of identifying the author of a document, given a set of documents of known authorship. Alternatively, unsupervised methods have been developed primarily as visualisation tools to assist the manual discovery of clusters of authorship within a corpus by analysts. However, there is a need in many fields for more sophisticated unsupervised methods to automate the discovery, profiling and organisation of related information through clustering of documents by authorship. An automated and unsupervised methodology for clustering documents by authorship is proposed in this paper. The methodology is named NUANCE, for n-gram Unsupervised Automated Natural Cluster Ensemble. Testing indicates that the derived clusters have a strong correlation to the true authorship of unseen documents.

Original languageEnglish
Pages (from-to)95-120
Number of pages26
JournalNatural Language Engineering
Volume19
Issue number1
DOIs
Publication statusPublished - Jan 2013
Externally publishedYes

Fingerprint

Dive into the research topics of 'Automated unsupervised authorship analysis using evidence accumulation clustering'. Together they form a unique fingerprint.

Cite this