Is newer better? - Evaluating the effects of data curation on integrated analyses in Saccharomyces cerevisiae

Katherine James, Anil Wipat, Jennifer Hallinan*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


Recent high-throughput experiments have produced a wealth of heterogeneous datasets, each of which provides information about different aspects of the cell. Consequently, integration of diverse data types is essential in order to address many biological questions. The quality of any integrated analysis system is dependent upon the quality of its component data, and upon the Gold Standard data used to evaluate it. It is commonly assumed that the quality of data improves as databases grow and change, particularly for manually curated databases. However, the validity of this assumption can be questioned, given the constant changes in the data coupled with the high level of noise associated with high-throughput experimental techniques. One of the most powerful approaches to data integration is the use of Probabilistic Functional Integrated Networks (PFINs). Here, we systematically analyse the changes in four highly-curated and widely-used online databases and evaluate the extent to which these changes affect the protein function prediction performance of PFINs in the yeast Saccharomyces cerevisiae. We find that the global trend in network performance improves over time. Where individual areas of biology are concerned, however, the most recent files do not always produce the best results. Individual datasets have unique biases towards different biological processes and by selecting and integrating relevant datasets performance can be improved. When using any type of integrated system to answer a specific biological question careful selection of raw data and Gold Standard is vital, since the most recent data may not be the most appropriate.

Original languageEnglish
Pages (from-to)715-727
Number of pages13
JournalIntegrative Biology (United Kingdom)
Issue number7
Publication statusPublished - Jul 2012
Externally publishedYes


Dive into the research topics of 'Is newer better? - Evaluating the effects of data curation on integrated analyses in Saccharomyces cerevisiae'. Together they form a unique fingerprint.

Cite this