A simple way to improve multivariate analyses of paleoecological data sets

John Alroy*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    12 Citations (Scopus)


    Multivariate methods such as cluster analysis and ordination are basic to paleoecology, but the messy nature of fossil occurrence data often makes it difficult to recover clear patterns. A recently described faunal similarity index based on the Forbes coefficient improves results when its complement is employed as a distance metric. This index involves adding terms to the Forbes equation and ignoring one of the counts it employs (that of species found in neither of the samples under consideration). Analyses of simulated data matrices demonstrate its advantages. These matrices include large and small samples from two partially overlapping species pools. In a cluster analysis, the widely used Dice coefficient and the Euclidean distance metric both create groupings that reflect sample size, the Simpson index suggests large differences that do not exist, and the corrected Forbes index creates groupings based strictly on true faunal overlap. In a principal coordinates analysis (PCoA) the Forbes index almost removes the sample-size signal but other approaches create a second axis strongly dominated by sample size. Meanwhile, species lists of late Pleistocene mammals from the United States capture biogeographic signals that standard ordination methods do recover, but the adjusted Forbes coefficient spaces the points out more sensibly. Finally, when biome-scale lists for living mammals are added to the data set and extinct species are removed, correspondence analysis misleadingly separates out the biome lists, and PCoA based on the Dice coefficient places them to the edge of the cloud of fossil assemblage data points. PCoA based on the Forbes index places them in more reasonable positions. Thus, only the adjusted Forbes index is able to recover true biological patterns. These results suggest that the index may be useful in analyzing not only paleontological data sets but any data set that includes species lists having highly variable lengths. ©

    Original languageEnglish
    Pages (from-to)377-386
    Number of pages10
    Issue number3
    Publication statusPublished - Jun 2015


    Dive into the research topics of 'A simple way to improve multivariate analyses of paleoecological data sets'. Together they form a unique fingerprint.

    Cite this