An improved hybrid algorithm for multiple change-point detection in array CGH data

G. Y. Sofronov, T. V. Polushina, M. W. Jayawardana

    Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

    Abstract

    A human genome is highly structured. Usually, the structure forms regions having patterns of a specific property. It is well-known that analysis of biological sequences is often confronted with measurements for the gene expression levels. When these observations are ordered by their location on the genome, the values form clouds with different observed means, supposedly reflecting different mean levels. The statistical analysis of these sequences aims at finding chromosomal regions with “abnormal” (increased o r decreased) mean levels. Therefore, identifying genomic regions associated with systematic aberrations provides insights into the initiation and progression of a disease, and improves the diagnosis, prognosis and therapy strategies.
    In this paper, we present a further extension of our work, where we propose a two-staged hybrid algorithm to identify structural patterns in genomic sequences. At the first stage of the algorithm, an e fficient sequential change-point detection procedure (for example, the Shiryaev-Roberts procedure or the cumulative sum control chart (CUSUM) procedure) is applied. Then the obtained locations of the change-points are used to initialize the Cross-Entropy (CE) algorithm, which is an evolutionary stochastic optimization method that estimates both the number of change-points and their corresponding locations. The first-stage of the algorithm is very sensitive for the thresholds selection, and the identification of optimal thresholds will increase the accuracy of the results and further improve the efficiency of the a lgorithm. In this study, we propose an improved hybrid algorithm for change-point detection, which uses optimal thresholds for the sequential change-point detection procedure and the CE method to obtain more precised estimates. In order to illustrate the usefulness of the algorithm, we have performed a comparison of the proposed hybrid algorithms for both artificially generated data and real aCGH experimental data. Our results show that the proposed methodologies are effective in detecting multiple change-points in biological sequences.
    LanguageEnglish
    Title of host publication22nd International Congress on Modelling and Simulation
    EditorsG. Syme, D. Hatton MacDonald, B. Fulton, J. Piantadosi
    Place of Publicationmssanz.org.au
    PublisherModelling & Simulation Society Australia & New Zealand
    Pages508-514
    Number of pages7
    ISBN (Electronic)9780987214379
    Publication statusPublished - 2017
    EventInternational Congress on Modelling and Simulation (22nd : 2017) - Hobart, Australia
    Duration: 3 Dec 20178 Dec 2017

    Conference

    ConferenceInternational Congress on Modelling and Simulation (22nd : 2017)
    CountryAustralia
    CityHobart
    Period3/12/178/12/17

    Fingerprint

    Entropy
    Genes
    Aberrations
    Gene expression
    Statistical methods
    Control charts

    Keywords

    • Change-point detection
    • aCGH microarray data
    • CNVs
    • DNA copy number
    • combinatorial optimization
    • Cross-Entropy method

    Cite this

    Sofronov, G. Y., Polushina, T. V., & Jayawardana, M. W. (2017). An improved hybrid algorithm for multiple change-point detection in array CGH data. In G. Syme, D. Hatton MacDonald, B. Fulton, & J. Piantadosi (Eds.), 22nd International Congress on Modelling and Simulation (pp. 508-514). mssanz.org.au: Modelling & Simulation Society Australia & New Zealand.
    Sofronov, G. Y. ; Polushina, T. V. ; Jayawardana, M. W. / An improved hybrid algorithm for multiple change-point detection in array CGH data. 22nd International Congress on Modelling and Simulation. editor / G. Syme ; D. Hatton MacDonald ; B. Fulton ; J. Piantadosi. mssanz.org.au : Modelling & Simulation Society Australia & New Zealand, 2017. pp. 508-514
    @inproceedings{9666109a9b984066a223de3da4afe6c0,
    title = "An improved hybrid algorithm for multiple change-point detection in array CGH data",
    abstract = "A human genome is highly structured. Usually, the structure forms regions having patterns of a specific property. It is well-known that analysis of biological sequences is often confronted with measurements for the gene expression levels. When these observations are ordered by their location on the genome, the values form clouds with different observed means, supposedly reflecting different mean levels. The statistical analysis of these sequences aims at finding chromosomal regions with “abnormal” (increased o r decreased) mean levels. Therefore, identifying genomic regions associated with systematic aberrations provides insights into the initiation and progression of a disease, and improves the diagnosis, prognosis and therapy strategies.In this paper, we present a further extension of our work, where we propose a two-staged hybrid algorithm to identify structural patterns in genomic sequences. At the first stage of the algorithm, an e fficient sequential change-point detection procedure (for example, the Shiryaev-Roberts procedure or the cumulative sum control chart (CUSUM) procedure) is applied. Then the obtained locations of the change-points are used to initialize the Cross-Entropy (CE) algorithm, which is an evolutionary stochastic optimization method that estimates both the number of change-points and their corresponding locations. The first-stage of the algorithm is very sensitive for the thresholds selection, and the identification of optimal thresholds will increase the accuracy of the results and further improve the efficiency of the a lgorithm. In this study, we propose an improved hybrid algorithm for change-point detection, which uses optimal thresholds for the sequential change-point detection procedure and the CE method to obtain more precised estimates. In order to illustrate the usefulness of the algorithm, we have performed a comparison of the proposed hybrid algorithms for both artificially generated data and real aCGH experimental data. Our results show that the proposed methodologies are effective in detecting multiple change-points in biological sequences.",
    keywords = "Change-point detection, aCGH microarray data, CNVs, DNA copy number, combinatorial optimization, Cross-Entropy method",
    author = "Sofronov, {G. Y.} and Polushina, {T. V.} and Jayawardana, {M. W.}",
    year = "2017",
    language = "English",
    pages = "508--514",
    editor = "G. Syme and {Hatton MacDonald}, D. and B. Fulton and J. Piantadosi",
    booktitle = "22nd International Congress on Modelling and Simulation",
    publisher = "Modelling & Simulation Society Australia & New Zealand",
    address = "Australia",

    }

    Sofronov, GY, Polushina, TV & Jayawardana, MW 2017, An improved hybrid algorithm for multiple change-point detection in array CGH data. in G Syme, D Hatton MacDonald, B Fulton & J Piantadosi (eds), 22nd International Congress on Modelling and Simulation. Modelling & Simulation Society Australia & New Zealand, mssanz.org.au, pp. 508-514, International Congress on Modelling and Simulation (22nd : 2017), Hobart, Australia, 3/12/17.

    An improved hybrid algorithm for multiple change-point detection in array CGH data. / Sofronov, G. Y.; Polushina, T. V.; Jayawardana, M. W.

    22nd International Congress on Modelling and Simulation. ed. / G. Syme; D. Hatton MacDonald; B. Fulton; J. Piantadosi. mssanz.org.au : Modelling & Simulation Society Australia & New Zealand, 2017. p. 508-514.

    Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

    TY - GEN

    T1 - An improved hybrid algorithm for multiple change-point detection in array CGH data

    AU - Sofronov, G. Y.

    AU - Polushina, T. V.

    AU - Jayawardana, M. W.

    PY - 2017

    Y1 - 2017

    N2 - A human genome is highly structured. Usually, the structure forms regions having patterns of a specific property. It is well-known that analysis of biological sequences is often confronted with measurements for the gene expression levels. When these observations are ordered by their location on the genome, the values form clouds with different observed means, supposedly reflecting different mean levels. The statistical analysis of these sequences aims at finding chromosomal regions with “abnormal” (increased o r decreased) mean levels. Therefore, identifying genomic regions associated with systematic aberrations provides insights into the initiation and progression of a disease, and improves the diagnosis, prognosis and therapy strategies.In this paper, we present a further extension of our work, where we propose a two-staged hybrid algorithm to identify structural patterns in genomic sequences. At the first stage of the algorithm, an e fficient sequential change-point detection procedure (for example, the Shiryaev-Roberts procedure or the cumulative sum control chart (CUSUM) procedure) is applied. Then the obtained locations of the change-points are used to initialize the Cross-Entropy (CE) algorithm, which is an evolutionary stochastic optimization method that estimates both the number of change-points and their corresponding locations. The first-stage of the algorithm is very sensitive for the thresholds selection, and the identification of optimal thresholds will increase the accuracy of the results and further improve the efficiency of the a lgorithm. In this study, we propose an improved hybrid algorithm for change-point detection, which uses optimal thresholds for the sequential change-point detection procedure and the CE method to obtain more precised estimates. In order to illustrate the usefulness of the algorithm, we have performed a comparison of the proposed hybrid algorithms for both artificially generated data and real aCGH experimental data. Our results show that the proposed methodologies are effective in detecting multiple change-points in biological sequences.

    AB - A human genome is highly structured. Usually, the structure forms regions having patterns of a specific property. It is well-known that analysis of biological sequences is often confronted with measurements for the gene expression levels. When these observations are ordered by their location on the genome, the values form clouds with different observed means, supposedly reflecting different mean levels. The statistical analysis of these sequences aims at finding chromosomal regions with “abnormal” (increased o r decreased) mean levels. Therefore, identifying genomic regions associated with systematic aberrations provides insights into the initiation and progression of a disease, and improves the diagnosis, prognosis and therapy strategies.In this paper, we present a further extension of our work, where we propose a two-staged hybrid algorithm to identify structural patterns in genomic sequences. At the first stage of the algorithm, an e fficient sequential change-point detection procedure (for example, the Shiryaev-Roberts procedure or the cumulative sum control chart (CUSUM) procedure) is applied. Then the obtained locations of the change-points are used to initialize the Cross-Entropy (CE) algorithm, which is an evolutionary stochastic optimization method that estimates both the number of change-points and their corresponding locations. The first-stage of the algorithm is very sensitive for the thresholds selection, and the identification of optimal thresholds will increase the accuracy of the results and further improve the efficiency of the a lgorithm. In this study, we propose an improved hybrid algorithm for change-point detection, which uses optimal thresholds for the sequential change-point detection procedure and the CE method to obtain more precised estimates. In order to illustrate the usefulness of the algorithm, we have performed a comparison of the proposed hybrid algorithms for both artificially generated data and real aCGH experimental data. Our results show that the proposed methodologies are effective in detecting multiple change-points in biological sequences.

    KW - Change-point detection

    KW - aCGH microarray data

    KW - CNVs

    KW - DNA copy number

    KW - combinatorial optimization

    KW - Cross-Entropy method

    UR - http://mssanz.org.au/modsim2017/

    M3 - Conference proceeding contribution

    SP - 508

    EP - 514

    BT - 22nd International Congress on Modelling and Simulation

    A2 - Syme, G.

    A2 - Hatton MacDonald, D.

    A2 - Fulton, B.

    A2 - Piantadosi, J.

    PB - Modelling & Simulation Society Australia & New Zealand

    CY - mssanz.org.au

    ER -

    Sofronov GY, Polushina TV, Jayawardana MW. An improved hybrid algorithm for multiple change-point detection in array CGH data. In Syme G, Hatton MacDonald D, Fulton B, Piantadosi J, editors, 22nd International Congress on Modelling and Simulation. mssanz.org.au: Modelling & Simulation Society Australia & New Zealand. 2017. p. 508-514