Bigdata:F: Statistical Learning With Large Dynamic Tensor Data

Description

Time series analysis is mainly applied in the discovery of dependent and dynamic structure of observations over time, and in accurate prediction of potential outcomes of such data in the future. In the big-data era, modern data collection capabilities have led to massive amounts of time series data. Large tensor (or multi-dimensional array) data are now routinely collected in a wide range of applications. For example, a group of countries will report a set of economic indicators each quarter, forming a matrix (2-dimensional array) time series, with each column representing a country and each row representing an economic indicator. The import and export volume of different types of goods for a group of countries over time form a 3-dimensional array time series. The aim of the project is to lay a foundation and develop a general framework to systematically study the dynamics of such tensor systems, decipher the joint behavior of each individual time series in the tensor array, and provide methods for accurate prediction. The framework will include general and specific statistical models, practical applications, statistical methods and their theoretical and empirical properties, computational algorithms and software, and implementation in several data sets. The research can be applied to application areas ranging from finance and economics, environmental sciences, and human behavior (e.g. social networks) to neuroscience and engineering. The project also addresses the training and education of future data scientists. In the big-data era, large tensor time series are routinely observed in a wide range of applications. This project aims to develop state-of-the-art statistical tools to effectively and efficiently extract useful information from such big complex data. The work concerns a general framework of statistical learning with large dynamic tensor data. Specifically, the project will develop a general class of tensor factor models, with modifications for specific applications, for modeling matrix- and tensor-valued time series, dynamic networks, and spatial temporal data. The results are expected to be directly applicable to economic tensor data, import-export volume time series, dynamic social networks, pollution monitoring, problems in fluid dynamics, and dynamic brain connectivity networks. Model estimation procedures, along with their theoretical foundations will be developed. The research will enrich the toolkit of statistical learning for a highly important and widely encountered class of big-data problems. The project also involves research training of graduate and undergraduate students in the field of statistical learning and its applications. The project will develop and disseminate free software, including an array of cleaned data sets for research, and a permanently maintained website as a hub for dissemination of future dynamic tensor research. An international conference on large dynamic tensor analysis will be organized. Evaluation of the computational algorithms and implementation of the methods for large scale applications will leverage cloud computing resources provided through an agreement between commercial cloud service providers and NSF for the BIGDATA solicitation.
StatusActive
Effective start/end date9/1/178/31/20

Funding

  • National Science Foundation (NSF)

Fingerprint

Tensors
Time series
Economics
Time series analysis
Finance
Cloud computing
Fluid dynamics