The need to analyze multivariate data arises in many disciplines, including computer science, engineering, meteorology, chemometrics, psychology, sociology, biology, and genetics, among others. A primary goal of multivariate statistical analysis is to model and understand the complex interrelationships between different measurements or variables. With current trends in the sciences, an increasingly common occurrence is the collection of large amounts of information on each individual sample point or experimental unit, even though the number of sample points or experimental units themselves may remain relatively small. This results in an extremely large number of parameters or interrelationships between variables to consider, but with insufficient data to adequately model these relationships using classical statistical methods. This research project aims to investigate novel ways to model such high-dimensional data based on relatively small sample sizes. Another issue that arises when many measurements are recorded on each sample point is that of large errors or outliers in the measurements. This may make the conclusion based on classical statistical methods suspect if the outliers are not detected. For high-dimensional data, though, detecting outliers is known to be problematic, and so an alternative is to use robust statistical methods, that is, methods producing valid conclusions even if the data contains bad data points. The robustness of the statistical methods developed within the research project will be evaluated.This project will use penalization methods, which have a long history within statistics, for developing models and estimation procedures for high-dimensional covariance matrices. It has long been recognized that the larger and smaller sample eigenvalues of random matrices are heavily biased upwards and downwards respectively, even for moderately large sample sizes. This problem can be addressed by using penalization methods, which shrink eigenvalues together. Such shrinkage, though, cannot be accomplished using the usual penalties which are convex functions of the precision matrix. This project will employ geodesic convex penalties. Furthermore, some novel non-smooth geodesic convex penalties are to be introduced, which not only shrink eigenvalues together but also have a lasso-type effect of creating subsets of equal eigenvalues. This non-smooth penalization approach thus yields a model selection method, or more specifically a multi-spiked covariance model selection method. The geodesic convex penalization approach is to be first developed under the classical multivariate normal setting. Methods developed under this setting, though, are well known to perform poorly if the multivariate normal model does not hold. A simple and often used approach for making classical methods more robust is the plug-in method, that is, to simply replace the role of the sample covariance matrix in a method with a robust alternative. For modest sample sizes relative to the dimension of the data, such plug-in methods tend not to differ greatly in performance from those utilizing the sample covariance matrix. To address this shortcoming, non-smooth penalized M-estimators of the covariance matrix are to be developed and studied. Here, the concept of geodesic convexity plays a crucial role.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|Effective start/end date||7/1/18 → 6/30/21|
- National Science Foundation (National Science Foundation (NSF))