TY - JOUR
T1 - Automated unsupervised authorship analysis using evidence accumulation clustering
AU - Layton, Robert
AU - Watters, Paul
AU - Dazeley, Richard
PY - 2013/1
Y1 - 2013/1
N2 - Authorship Analysis aims to extract information about the authorship of documents from features within those documents. Typically, this is performed as a classification task with the aim of identifying the author of a document, given a set of documents of known authorship. Alternatively, unsupervised methods have been developed primarily as visualisation tools to assist the manual discovery of clusters of authorship within a corpus by analysts. However, there is a need in many fields for more sophisticated unsupervised methods to automate the discovery, profiling and organisation of related information through clustering of documents by authorship. An automated and unsupervised methodology for clustering documents by authorship is proposed in this paper. The methodology is named NUANCE, for n-gram Unsupervised Automated Natural Cluster Ensemble. Testing indicates that the derived clusters have a strong correlation to the true authorship of unseen documents.
AB - Authorship Analysis aims to extract information about the authorship of documents from features within those documents. Typically, this is performed as a classification task with the aim of identifying the author of a document, given a set of documents of known authorship. Alternatively, unsupervised methods have been developed primarily as visualisation tools to assist the manual discovery of clusters of authorship within a corpus by analysts. However, there is a need in many fields for more sophisticated unsupervised methods to automate the discovery, profiling and organisation of related information through clustering of documents by authorship. An automated and unsupervised methodology for clustering documents by authorship is proposed in this paper. The methodology is named NUANCE, for n-gram Unsupervised Automated Natural Cluster Ensemble. Testing indicates that the derived clusters have a strong correlation to the true authorship of unseen documents.
UR - http://www.scopus.com/inward/record.url?scp=84870521851&partnerID=8YFLogxK
U2 - 10.1017/S1351324911000313
DO - 10.1017/S1351324911000313
M3 - Article
AN - SCOPUS:84870521851
VL - 19
SP - 95
EP - 120
JO - Natural Language Engineering
JF - Natural Language Engineering
SN - 1351-3249
IS - 1
ER -