We model DNA read count data obtained through next generation sequencing (NGS) technologies as a multiple change-point process. This means that the data are divided into di↵erent segments based on the number of hangepoints. Each segment of the process is modeled by utilizing the zero-inflated negative binomial (ZINB), as well as the negative binomial (NB) istribution in the Generalized additive models for location, scale and shape (GAMLSS) framework. It is observed that ZINB and NB based models, fit the data etter than the competing Poisson model, in which the observed read counts are highly overdispersed as well as zero-inflated. Moreover, we have considered incorporating auxiliary information to further improve the change-point modelling process by utilizing the GAMLSS framework. The extended Cross-Entropy (CE) method which uses a four-parameter beta distribution is used to estimate the number of change-points as well as their corresponding genome locations. Furthermore, parallel implementation of the procedure results a significant improvement in total running time, in which the procedures are highly computationally intensive. We apply the proposed methodology to find change-points in DNA read count data obtained through Illumina TruSeq exome capture of patients with celiac disease. Our results suggest that the proposed GAMLSS based CE method is an e↵ective methodology to detect change-points in genome-wide data.
|Title of host publication||Proceedings of the 28th International Workshop on Statistical Modelling|
|Editors||Muggeo V. M. R, V. Capursi, G. Boscaino, G. Lovison|
|Place of Publication||Palermo, Italy|
|Publisher||Università di Palermo|
|Number of pages||5|
|Publication status||Published - 2013|
|Event||International Workshop on Statistical Modelling (28th : 2013) - Palermo, Italy|
Duration: 8 Jul 2013 → 12 Jul 2013
|Workshop||International Workshop on Statistical Modelling (28th : 2013)|
|Period||8/07/13 → 12/07/13|
- Cross-Entropy Method
- Change-Point Modelling
- Combinatorial Optimization
Priyadarshana, M. W. J. R., & Sofronov, G. (2013). GAMLSS and extended Cross-Entropy method to detect multiple change-points in DNA read count data. In Muggeo V. M. R, V. Capursi, G. Boscaino, & G. Lovison (Eds.), Proceedings of the 28th International Workshop on Statistical Modelling (Vol. 1, pp. 453-457). Palermo, Italy: Università di Palermo.