Robust Methods for Exploring Multivariate Data

Project Details

Description

Abstract

PI: David Tyler (DMS-0305858)

Title: Robust Methods for Exploring Multivariate Data

The overall goal of this research project is to develop computationally feasible, conceptually appealing and theoretically defensible robust methods for exploring and making inferences about a multivariate data set. The main class of estimates to be studied is the multivariate redescending M-estimates with auxiliary scale recently introduced by the investigator. This class of estimates is based upon the key idea of partitioning the scatter component into a nuisance 'scale' component and a structural 'shape' component. This partitioning method produces a novel interpretation of robust multivariate estimation problems, and enables concepts from univariate robust statistics and from robust regression to be readily extended to the multivariate setting. In particular, it allows for the generalization of the regression MM-estimates to MM-estimates for multivariate data. Given that the regression MM-estimates are the default robust regression estimates in S-plus, theoretical and computational developments for the multivariate MM-estimates are expected to have wide impact as a standard method in the analysis of multivariate data. Aside from the MM-estimates, the redescending M-estimates with auxiliary scale also include the multivariate S-estimates and the multivariate constrained M-estimates. A general unifying study of the robustness properties, including influence functions, relative efficiencies, and maximum bias functions, of these and other multivariate M-estimates with auxiliary scale is to be undertaken. The methods and ideas underlying the robust estimates of multivariate location and scatter are conceptually broad enough to be extended to other settings, such as to multivariate linear models and to structured covariance problems, and such extensions are to be investigated.

Multivariate location and scatter play a central role in many classical statistical procedures, such as principal component analysis, discriminate analysis, and canonical correlation analysis, which are routinely applied in such diverse disciplines as psychology, biology, geology, and other fields. Hence, the further development of robust estimates for multivariate location and scatter can have a substantial impact on data analysis methods in these scientific areas. Aside from the intrinsic importance of robust estimates of multivariate location and scatter, such estimates also serve as an important first step to a deeper analysis of a high dimensional data set. The development of exploratory methods for multivariate data based on the redescending M-estimates with auxiliary scale is another primary goal of this research project. Such exploratory methods for high dimensional data are pertinent to contemporary data problems arising, for example, in areas such as data mining and in image data. For such data problems, the classical model of data arising as signal plus noise is inappropriate and the data is better viewed as arising as signal plus noise embedded within a mass of clutter. Robust multivariate methods are particularly apt for this latter view of data. The investigator has noted important links between this methodology and methodologies developed in other areas such as cluster analysis and computer vision. A deeper investigation into these links will be undertaken.

StatusFinished
Effective start/end date6/1/038/31/06

Funding

  • National Science Foundation: $211,155.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.