Projection pursuit describes a procedure for searching high-dimensional data for “interesting” low-dimensional projections via the optimization of a criterion function called the projection pursuit index. By empirically examining the optimization process for several projection pursuit indexes, we observed differences in the types of structure that maximized each index. We were especially curious about differences between two indexes based on expansions in terms of orthogonal polynomials, the Legendre index, and the Hermite index. Being fast to compute, these indexes are ideally suited for dynamic graphics implementations. Both Legendre and Hermite indexes are weighted L2 distances between the density of the projected data and a standard normal density. A general form for this type of index is introduced that encompasses both indexes. The form clarifies the effects of the weight function on the index’s sensitivity to differences from normality, highlighting some conceptual problems with the Legendre and Hermite indexes. A new index, called the Natural Hermite index, which alleviates some of these problems, is introduced. A polynomial expansion of the data density reduces the form of the index to a sum of squares of the coefficients used in the expansion. This drew our attention to examining these coefficients as indexes in their own right. We found that the first two coefficients, and the lowest-order indexes produced by them, are the most useful ones for practical data exploration because they respond to structure that can be analytically identified, and because they have “long-sighted” vision that enables them to “see” large structure from a distance. Complementing this low-order behavior, the higher-order indexes are “short-sighted.” They are able to see intricate structure, but only when they are close to it. We also show some practical use of projection pursuit using the polynomial indexes, including a discovery of previously unseen stmcture in a set of telephone usage data, and two cautionary examples which illustrate that structure found is not always meaningful.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Discrete Mathematics and Combinatorics
- Statistics, Probability and Uncertainty
- Density estimation
- Exploratory multivariate data analysis
- Principal component analysis