bcGST: an interactive bias-correction method to identify over-represented gene-sets in boutique arrays

Kevin Y. X. Wang, Alexander M. Menzies, Ines P. Silva, James S. Wilmott, Yibing Yan, Matthew Wongchenko, Richard F. Kefford, Richard A. Scolyer, Georgina V. Long, Garth Tarr, Samuel Mueller, Jean Y. H. Yang

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Motivation: Gene annotation and pathway databases such as Gene Ontology and Kyoto Encyclopaedia of Genes and Genomes are important tools in Gene-Set Test (GST) that describe gene biological functions and associated pathways. GST aims to establish an association relationship between a gene-set of interest and an annotation. Importantly, GST tests for over-representation of genes in an annotation term. One implicit assumption of GST is that the gene expression platform captures the complete or a very large proportion of the genome. However, this assumption is neither satisfied for the increasingly popular boutique array nor the custom designed gene expression profiling platform. Specifically, conventional GST is no longer appropriate due to the gene-set selection bias induced during the construction of these platforms. Results: We propose bcGST, a bias-corrected GST by introducing bias-correction terms in the contingency table needed for calculating the Fisher's Exact Test. The adjustment method works by estimating the proportion of genes captured on the array with respect to the genome in order to assist filtration of annotation terms that would otherwise be falsely included or excluded. We illustrate the practicality of bcGST and its stability through multiple differential gene expression analyses in melanoma and the Cancer Genome Atlas cancer studies. Availability and implementation: The bcGST method is made available as a Shiny web application at http://shiny.maths.usyd.edu.au/bcGST/.

LanguageEnglish
Pages1350-1357
Number of pages8
JournalBioinformatics
Volume35
Issue number8
Early online date12 Sep 2018
DOIs
Publication statusPublished - 15 Apr 2019

Fingerprint

Bias Correction
Genes
Gene
Test Set
Annotation
Genome
Molecular Sequence Annotation
Gene Expression
Gene expression
Pathway
Cancer
Proportion
Term
Gene Ontology
Fisher's Exact Test
Selection Bias
Atlases
Gene Expression Profiling
Melanoma
Differential Expression

Cite this

Wang, K. Y. X., Menzies, A. M., Silva, I. P., Wilmott, J. S., Yan, Y., Wongchenko, M., ... Yang, J. Y. H. (2019). bcGST: an interactive bias-correction method to identify over-represented gene-sets in boutique arrays. Bioinformatics, 35(8), 1350-1357. https://doi.org/10.1093/bioinformatics/bty783
Wang, Kevin Y. X. ; Menzies, Alexander M. ; Silva, Ines P. ; Wilmott, James S. ; Yan, Yibing ; Wongchenko, Matthew ; Kefford, Richard F. ; Scolyer, Richard A. ; Long, Georgina V. ; Tarr, Garth ; Mueller, Samuel ; Yang, Jean Y. H. / bcGST : an interactive bias-correction method to identify over-represented gene-sets in boutique arrays. In: Bioinformatics. 2019 ; Vol. 35, No. 8. pp. 1350-1357.
@article{a4db859273644ac487f482d492a60e43,
title = "bcGST: an interactive bias-correction method to identify over-represented gene-sets in boutique arrays",
abstract = "Motivation: Gene annotation and pathway databases such as Gene Ontology and Kyoto Encyclopaedia of Genes and Genomes are important tools in Gene-Set Test (GST) that describe gene biological functions and associated pathways. GST aims to establish an association relationship between a gene-set of interest and an annotation. Importantly, GST tests for over-representation of genes in an annotation term. One implicit assumption of GST is that the gene expression platform captures the complete or a very large proportion of the genome. However, this assumption is neither satisfied for the increasingly popular boutique array nor the custom designed gene expression profiling platform. Specifically, conventional GST is no longer appropriate due to the gene-set selection bias induced during the construction of these platforms. Results: We propose bcGST, a bias-corrected GST by introducing bias-correction terms in the contingency table needed for calculating the Fisher's Exact Test. The adjustment method works by estimating the proportion of genes captured on the array with respect to the genome in order to assist filtration of annotation terms that would otherwise be falsely included or excluded. We illustrate the practicality of bcGST and its stability through multiple differential gene expression analyses in melanoma and the Cancer Genome Atlas cancer studies. Availability and implementation: The bcGST method is made available as a Shiny web application at http://shiny.maths.usyd.edu.au/bcGST/.",
author = "Wang, {Kevin Y. X.} and Menzies, {Alexander M.} and Silva, {Ines P.} and Wilmott, {James S.} and Yibing Yan and Matthew Wongchenko and Kefford, {Richard F.} and Scolyer, {Richard A.} and Long, {Georgina V.} and Garth Tarr and Samuel Mueller and Yang, {Jean Y. H.}",
year = "2019",
month = "4",
day = "15",
doi = "10.1093/bioinformatics/bty783",
language = "English",
volume = "35",
pages = "1350--1357",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "OXFORD UNIV PRESS INC",
number = "8",

}

Wang, KYX, Menzies, AM, Silva, IP, Wilmott, JS, Yan, Y, Wongchenko, M, Kefford, RF, Scolyer, RA, Long, GV, Tarr, G, Mueller, S & Yang, JYH 2019, 'bcGST: an interactive bias-correction method to identify over-represented gene-sets in boutique arrays', Bioinformatics, vol. 35, no. 8, pp. 1350-1357. https://doi.org/10.1093/bioinformatics/bty783

bcGST : an interactive bias-correction method to identify over-represented gene-sets in boutique arrays. / Wang, Kevin Y. X.; Menzies, Alexander M.; Silva, Ines P.; Wilmott, James S.; Yan, Yibing; Wongchenko, Matthew; Kefford, Richard F.; Scolyer, Richard A.; Long, Georgina V.; Tarr, Garth; Mueller, Samuel; Yang, Jean Y. H.

In: Bioinformatics, Vol. 35, No. 8, 15.04.2019, p. 1350-1357.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - bcGST

T2 - Bioinformatics

AU - Wang, Kevin Y. X.

AU - Menzies, Alexander M.

AU - Silva, Ines P.

AU - Wilmott, James S.

AU - Yan, Yibing

AU - Wongchenko, Matthew

AU - Kefford, Richard F.

AU - Scolyer, Richard A.

AU - Long, Georgina V.

AU - Tarr, Garth

AU - Mueller, Samuel

AU - Yang, Jean Y. H.

PY - 2019/4/15

Y1 - 2019/4/15

N2 - Motivation: Gene annotation and pathway databases such as Gene Ontology and Kyoto Encyclopaedia of Genes and Genomes are important tools in Gene-Set Test (GST) that describe gene biological functions and associated pathways. GST aims to establish an association relationship between a gene-set of interest and an annotation. Importantly, GST tests for over-representation of genes in an annotation term. One implicit assumption of GST is that the gene expression platform captures the complete or a very large proportion of the genome. However, this assumption is neither satisfied for the increasingly popular boutique array nor the custom designed gene expression profiling platform. Specifically, conventional GST is no longer appropriate due to the gene-set selection bias induced during the construction of these platforms. Results: We propose bcGST, a bias-corrected GST by introducing bias-correction terms in the contingency table needed for calculating the Fisher's Exact Test. The adjustment method works by estimating the proportion of genes captured on the array with respect to the genome in order to assist filtration of annotation terms that would otherwise be falsely included or excluded. We illustrate the practicality of bcGST and its stability through multiple differential gene expression analyses in melanoma and the Cancer Genome Atlas cancer studies. Availability and implementation: The bcGST method is made available as a Shiny web application at http://shiny.maths.usyd.edu.au/bcGST/.

AB - Motivation: Gene annotation and pathway databases such as Gene Ontology and Kyoto Encyclopaedia of Genes and Genomes are important tools in Gene-Set Test (GST) that describe gene biological functions and associated pathways. GST aims to establish an association relationship between a gene-set of interest and an annotation. Importantly, GST tests for over-representation of genes in an annotation term. One implicit assumption of GST is that the gene expression platform captures the complete or a very large proportion of the genome. However, this assumption is neither satisfied for the increasingly popular boutique array nor the custom designed gene expression profiling platform. Specifically, conventional GST is no longer appropriate due to the gene-set selection bias induced during the construction of these platforms. Results: We propose bcGST, a bias-corrected GST by introducing bias-correction terms in the contingency table needed for calculating the Fisher's Exact Test. The adjustment method works by estimating the proportion of genes captured on the array with respect to the genome in order to assist filtration of annotation terms that would otherwise be falsely included or excluded. We illustrate the practicality of bcGST and its stability through multiple differential gene expression analyses in melanoma and the Cancer Genome Atlas cancer studies. Availability and implementation: The bcGST method is made available as a Shiny web application at http://shiny.maths.usyd.edu.au/bcGST/.

UR - http://www.scopus.com/inward/record.url?scp=85068495845&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty783

DO - 10.1093/bioinformatics/bty783

M3 - Article

VL - 35

SP - 1350

EP - 1357

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 8

ER -