A Data-driven approach to studying given names and their gender and ethnicity associations

Research output: Contribution to journalConference paperpeer-review

Abstract

Studying the structure of given names and how they associate with gender and ethnicity is an interesting research topic that has recently found practical uses in various areas. Given the paucity of annotated name data, we develop and make available a new dataset containing 14k given names. Using this dataset, we take a data-driven approach to this task and achieve up to 90% accuracy for classifying the gender of unseen names. For ethnicity identification, our system achieves 83% accuracy. We also experiment with a feature analysis method for exploring the most informative features for this task.
Original languageEnglish
Pages (from-to)145-149
Number of pages5
JournalProceedings of Australasian Language Technology Association Workshop 2014 : ALTA 2014
Publication statusPublished - 2014
EventAustralasian Language Technology Association Workshop (12th : 2014) - Melbourne, Australia
Duration: 26 Nov 201428 Nov 2014

Fingerprint

Dive into the research topics of 'A Data-driven approach to studying given names and their gender and ethnicity associations'. Together they form a unique fingerprint.

Cite this