Multivariate online regression analysis with heterogeneous streaming data

Lan Luo, Peter X.K. Song

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

New data collection and storage technologies have given rise to a new field of streaming data analytics, called real-time statistical methodology for online data analyses. Most existing online learning methods are based on homogeneity assumptions, which require the samples in a sequence to be independent and identically distributed. However, inter-data batch correlation and dynamically evolving batch-specific effects are among the key defining features of real-world streaming data such as electronic health records and mobile health data. This article is built under a state-space mixed model framework in which the observed data stream is driven by a latent state process that follows a Markov process. In this setting, online maximum likelihood estimation is made challenging by high-dimensional integrals and complex covariance structures. In this article, we develop a real-time Kalman-filter-based regression analysis method that updates both point estimates and their standard errors for fixed population average effects while adjusting for dynamic hidden effects. Both theoretical justification and numerical experiments demonstrate that our proposed online method has statistical properties similar to those of its offline counterpart and enjoys great computational efficiency. We also apply this method to analyze an electronic health record dataset.

Original languageEnglish (US)
Pages (from-to)111-133
Number of pages23
JournalCanadian Journal of Statistics
Volume51
Issue number1
DOIs
StatePublished - Mar 2023
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Keywords

  • Kalman filter
  • dynamic effects
  • online learning
  • state-space mixed models
  • streaming data

Fingerprint

Dive into the research topics of 'Multivariate online regression analysis with heterogeneous streaming data'. Together they form a unique fingerprint.

Cite this