Ratios in disguise, truths arise: glycomics meets compositional data analysis

Alexander R. Bennett, Jon Lundstrøm, Sayantani Chatterjee, Morten Thaysen-Andersen, Daniel Bojar

Research output: Working paperPreprint

Abstract

Comparative glycomics data are an instance of compositional data defined by the Aitchison simplex, where measured glycans are parts of a whole, indicated by relative abundances, which are then compared between conditions. Applying traditional statistical analyses to this type of data often results in misleading conclusions, such as spurious “decreases” of glycans between conditions when other structures sharply increase in abundance, or routine false-positive rates of >25% for differential abundance. Our work introduces a compositional data analysis framework, specifically tailored to comparative glycomics, to account for these data dependencies. We employ center log-ratio (CLR) and additive log-ratio (ALR) transformations, augmented with a model incorporating scale uncertainty/information, to introduce the most robust and sensitive glycomics data analysis pipeline. Applied to many publicly available comparative glycomics datasets, we show that this model controls false-positive rates and results in new biological findings. Additionally, we present new modalities to analyze comparative glycomics data with this framework. Alpha- and beta-diversity enable exploration of glycan distributions within and between biological samples, while cross-class glycan correlations shed light on complex and previously undetected interdependencies. These new approaches have revealed deeper insights into glycome variations that are critical to understanding the roles of glycans in health and disease.
Original languageEnglish
DOIs
Publication statusSubmitted - 10 Jun 2024

Publication series

NamebioRxiv

Cite this