Distinguishing phylogenetic signal from homoplasy (shared similarities among taxa that do not arise by common ancestry) is an implicit goal of any phylogenetic study. Large amounts of homoplasy can interfere with accurate tree inference, and it is expected that common measures of clade support, including bootstrap proportions and Bayesian posterior probabilities, should also be impacted to some degree by homoplasy. Through data simulation and analysis of 38 empirical data sets, we show that high amounts of homoplasy will affect all measures of clade support in a manner that is dependent on clade size. More specifically, the smallest taxon bipartitions in an unrooted tree topology will receive higher support relative to clades of intermediate sizes, even when all clades are supported by the same amount of data. We determine that the ultimate causes of this effect are the inclusion of random trees (due to homoplasy) during bootstrap resampling and Markov chain Monte Carlo (MCMC) topology searching and the higher relative proportion of small taxon bipartitions (i.e., 2 or 3 taxa) to larger sized bipartitions. However, the use of explicit model-based methods, especially Bayesian MCMC methods, effectively overcomes this clade size effect even when very small amounts of phylogenetic signal are present. We develop a post hoc statistic, the clade disparity index (CDI), to measure both the relative magnitude of the clade size effect and its statistical significance. In analyses of both simulated and empirical data, CDI values indicate that Bayesian MCMC analyses are substantially more likely to estimate clade support values that are uncorrelated with clade size than are maximum parsimony and maximum likelihood bootstrap analyses and thus less affected by homoplasy. These results may be especially relevant to "deep" phylogenetic problems, such as reconstructing the tree of life, as they represent the largest possible extremes of time and evolutionary rates, 2 factors that cause homoplasy.
- Bayesian posterior probability
- Clade size
- Prior probability