Galaxy and Mass Assembly: automatic morphological classification of galaxies using statistical learning

Sreevarsha Sreejith, Sergiy Pereverzyev, Lee S. Kelvin, Francine R. Marleau, Markus Haltmeier, Judith Ebner, Joss Bland-Hawthorn, Simon P. Driver, Alister W. Graham, Benne W. Holwerda, Andrew M. Hopkins, Jochen Liske, Jon Loveday, Amanda J. Moffett, Kevin A. Pimbblet, Edward N. Taylor, Lingyu Wang, Angus H. Wright

Research output: Contribution to journalArticlepeer-review

22 Citations (Scopus)


We apply four statistical learning methods to a sample of 7941 galaxies (z < 0.06) from the Galaxy And Mass Assembly survey to test the feasibility of using automated algorithms to classify galaxies. Using 10 features measured for each galaxy (sizes, colours, shape parameters, and stellar mass), we apply the techniques of Support Vector Machines, Classification Trees, Classification Trees with Random Forest (CTRF) and Neural Networks, and returning True Prediction Ratios (TPRs) of 75.8 per cent, 69.0 per cent, 76.2 per cent, and 76.0 per cent, respectively. Those occasions whereby all four algorithms agree with each other yet disagree with the visual classification ('unanimous disagreement') serves as a potential indicator of human error in classification, occurring in ~ 9 per cent of ellipticals, ~ 9 per cent of little blue spheroids, ~ 14 per cent of early-type spirals, ~ 21 per cent of intermediate-type spirals, and ~ 4 per cent of late-type spirals and irregulars. We observe that the choice of parameters rather than that of algorithms is more crucial in determining classification accuracy. Due to its simplicity in formulation and implementation, we recommend the CTRF algorithm for classifying future galaxy data sets. Adopting the CTRF algorithm, the TPRs of the five galaxy types are: E, 70.1 per cent; LBS, 75.6 per cent; S0-Sa, 63.6 per cent; Sab-Scd, 56.4 per cent, and Sd-Irr, 88.9 per cent. Further, we train a binary classifier using this CTRF algorithm that divides galaxies into spheroid-dominated (E, LBS, and S0-Sa) and disc-dominated (Sab-Scd and Sd-Irr), achieving an overall accuracy of 89.8 per cent. This translates into an accuracy of 84.9 per cent for spheroid-dominated systems and 92.5 per cent for disc-dominated systems.

Original languageEnglish
Pages (from-to)5232-5258
Number of pages27
JournalMonthly Notices of the Royal Astronomical Society
Issue number4
Publication statusPublished - 1 Mar 2018
Externally publishedYes


  • Galaxies: fundamental parameters
  • Galaxies: general
  • Galaxies: structure
  • Methods: statistical


Dive into the research topics of 'Galaxy and Mass Assembly: automatic morphological classification of galaxies using statistical learning'. Together they form a unique fingerprint.

Cite this