A sliced inverse regression approach for data stream

Marie Chavent, Stéphane Girard, Vanessa Kuentz-Simonet, Benoit Liquet, Thi Mong Ngoc Nguyen, Jérôme Saracco*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Citations (Scopus)

Abstract

In this article, we focus on data arriving sequentially by blocks in a stream. A semiparametric regression model involving a common effective dimension reduction (EDR) direction (Formula presented.) (Formula presented.) is assumed in each block. Our goal is to estimate this direction at each arrival of a new block. A simple direct approach consists of pooling all the observed blocks and estimating the EDR direction by the sliced inverse regression (SIR) method. But in practice, some disadvantages appear such as the storage of the blocks and the running time for large sample sizes. To overcome these drawbacks, we propose an adaptive SIR estimator of (Formula presented.) (Formula presented.) based on the optimization of a quality measure. The corresponding approach is faster both in terms of computational complexity and running time, and provides data storage benefits. The consistency of our estimator is established and its asymptotic distribution is given. An extension to multiple indices model is proposed. A graphical tool is also provided in order to detect changes in the underlying model, i.e., drift in the EDR direction or aberrant blocks in the data stream. A simulation study illustrates the numerical behavior of our estimator. Finally, an application to real data concerning the estimation of physical properties of the Mars surface is presented.

Original languageEnglish
Pages (from-to)1129-1152
Number of pages24
JournalComputational Statistics
Volume29
Issue number5
DOIs
Publication statusPublished - Oct 2014
Externally publishedYes

Keywords

  • Effective dimension reduction (EDR)
  • Sliced inverse regression (SIR)
  • Data stream

Fingerprint

Dive into the research topics of 'A sliced inverse regression approach for data stream'. Together they form a unique fingerprint.

Cite this