Number theory meets linguistics: modelling noun pluralisation across 1497 languages using 2-adic metrics

Gregory Baker, Diego Molla-Aliod

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

A simple machine learning model of pluralisation as a linear regression problem minimising a p-adic metric substantially outperforms even the most robust of Euclidean-space regressors on languages in the Indo-European, Austronesian, Trans New-Guinea, Sino-Tibetan, Nilo-Saharan, Oto-Meanguean and Atlantic-Congo language families. There is insufficient evidence to support modelling distinct noun declensions as a p-adic neighbourhood even in Indo-European languages.
Original languageEnglish
Title of host publicationProceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (volume 2: short papers)
Place of PublicationStroudsburg
PublisherAssociation for Computational Linguistics (ACL)
Pages24-32
Number of pages9
ISBN (Electronic)9781955917643
Publication statusPublished - 2022
EventThe 2nd Conference of the Asia-Pacific Chapter of the Association for
Computational Linguistics and the 12th International Joint Conference on Natural Language Processing
- Online
Duration: 20 Nov 202223 Nov 2022

Conference

ConferenceThe 2nd Conference of the Asia-Pacific Chapter of the Association for
Computational Linguistics and the 12th International Joint Conference on Natural Language Processing
Abbreviated titleAACL-IJCNLP 2022
CityOnline
Period20/11/2223/11/22

Fingerprint

Dive into the research topics of 'Number theory meets linguistics: modelling noun pluralisation across 1497 languages using 2-adic metrics'. Together they form a unique fingerprint.

Cite this