A simple approach for maximizing the overlap of phylogenetic and comparative data

Matthew W. Pennell*, Richard G. FitzJohn, William K. Cornwell

*Corresponding author for this work

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

Biologists are increasingly using curated, public data sets to conduct phylogenetic comparative analyses. Unfortunately, there is often a mismatch between species for which there is phylogenetic data and those for which other data are available. As a result, researchers are commonly forced to either drop species from analyses entirely or else impute the missing data. A simple strategy to improve the overlap of phylogenetic and comparative data is to swap species in the tree that lack data with ‘phylogenetically equivalent’ species that have data. While this procedure is logically straightforward, it quickly becomes very challenging to do by hand. Here, we present algorithms that use topological and taxonomic information to maximize the number of swaps without altering the structure of the phylogeny. We have implemented our method in a new R package phyndr, which will allow researchers to apply our algorithm to empirical data sets. It is relatively efficient such that taxon swaps can be quickly computed, even for large trees. To facilitate the use of taxonomic knowledge, we created a separate data package taxonlookup; it contains a curated, versioned taxonomic lookup for land plants and is interoperable with phyndr. Emerging online data bases and statistical advances are making it possible for researchers to investigate evolutionary questions at unprecedented scales. However, in this effort species mismatch among data sources will increasingly be a problem; evolutionary informatics tools, such as phyndr and taxonlookup, can help alleviate this issue.

Original languageEnglish
Pages (from-to)751-758
Number of pages8
JournalMethods in Ecology and Evolution
Volume7
Issue number6
DOIs
Publication statusPublished - 1 Jun 2016

Keywords

  • data imputation
  • evolutionary informatics
  • missing data
  • phylogenetic comparative methods
  • taxonomy

Fingerprint Dive into the research topics of 'A simple approach for maximizing the overlap of phylogenetic and comparative data'. Together they form a unique fingerprint.

Cite this