Authorship analysis of aliases: does topic influence accuracy?

Robert Layton, Paul A. Watters, Richard Dazeley

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)


Aliases play an important role in online environments by facilitating anonymity, but also can be used to hide the identity of cybercriminals. Previous studies have investigated this alias matching problem in an attempt to identify whether two aliases are shared by an author, which can assist with identifying users. Those studies create their training data by randomly splitting the documents associated with an alias into two sub-aliases. Models have been built that can regularly achieve over 90% accuracy for recovering the linkage between these 'random sub-aliases'. In this paper, random sub-alias generation is shown to enable these high accuracies, and thus does not adequately model the real-world problem. In contrast, creating sub-aliases using topic-based splitting drastically reduces the accuracy of all authorship methods tested. We then present a methodology that can be performed on non-topic controlled datasets, to produce topic-based sub-aliases that are more difficult to match. Finally, we present an experimental comparison between many authorship methods to see which methods better match aliases under these conditions, finding that local n-gram methods perform better than others.

Original languageEnglish
Pages (from-to)497-518
Number of pages22
JournalNatural Language Engineering
Issue number4
Early online date8 Oct 2013
Publication statusPublished - Aug 2015
Externally publishedYes


Dive into the research topics of 'Authorship analysis of aliases: does topic influence accuracy?'. Together they form a unique fingerprint.

Cite this