Abstract
New data collection and storage technologies have given rise to a new field of streaming data analytics, called real-time statistical methodology for online data analyses. Most existing online learning methods are based on homogeneity assumptions, which require the samples in a sequence to be independent and identically distributed. However, inter-data batch correlation and dynamically evolving batch-specific effects are among the key defining features of real-world streaming data such as electronic health records and mobile health data. This article is built under a state-space mixed model framework in which the observed data stream is driven by a latent state process that follows a Markov process. In this setting, online maximum likelihood estimation is made challenging by high-dimensional integrals and complex covariance structures. In this article, we develop a real-time Kalman-filter-based regression analysis method that updates both point estimates and their standard errors for fixed population average effects while adjusting for dynamic hidden effects. Both theoretical justification and numerical experiments demonstrate that our proposed online method has statistical properties similar to those of its offline counterpart and enjoys great computational efficiency. We also apply this method to analyze an electronic health record dataset.
Original language | English (US) |
---|---|
Pages (from-to) | 111-133 |
Number of pages | 23 |
Journal | Canadian Journal of Statistics |
Volume | 51 |
Issue number | 1 |
DOIs | |
State | Published - Mar 2023 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
Keywords
- Kalman filter
- dynamic effects
- online learning
- state-space mixed models
- streaming data