TY - GEN
T1 - Using betweenness centrality to identify manifold shortcuts
AU - Cukierski, William J.
AU - Foran, David J.
PY - 2008
Y1 - 2008
N2 - High-dimensional data presents a significant challenge to a broad spectrum of pattern recognition and machinelearning applications. Dimensionality reduction (DR) methods serve to remove unwanted variance and make such problems tractable. Several nonlinear DR methods, such as the well known ISOMAP algorithm, rely on a neighborhood graph to compute geodesic distances between data points. These graphs may sometimes contain unwanted edges which connect disparate regions of one or more manifolds. This topological sensitivity is well known [1, 9, 13], yet managing high-dimensional, noisy data in the absence of a priori knowledge, remains an open and difficult problem. This manuscript introduces a divisive, edge-removal method based on graph betweenness centrality which can robustly identify manifold-shorting edges. The problem of graph construction in high dimensions is discussed and the proposed algorithm is inserted into the ISOMAP workflow. ROC analysis is performed and the performance is tested on both synthetic and real datasets.
AB - High-dimensional data presents a significant challenge to a broad spectrum of pattern recognition and machinelearning applications. Dimensionality reduction (DR) methods serve to remove unwanted variance and make such problems tractable. Several nonlinear DR methods, such as the well known ISOMAP algorithm, rely on a neighborhood graph to compute geodesic distances between data points. These graphs may sometimes contain unwanted edges which connect disparate regions of one or more manifolds. This topological sensitivity is well known [1, 9, 13], yet managing high-dimensional, noisy data in the absence of a priori knowledge, remains an open and difficult problem. This manuscript introduces a divisive, edge-removal method based on graph betweenness centrality which can robustly identify manifold-shorting edges. The problem of graph construction in high dimensions is discussed and the proposed algorithm is inserted into the ISOMAP workflow. ROC analysis is performed and the performance is tested on both synthetic and real datasets.
UR - http://www.scopus.com/inward/record.url?scp=62449182648&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=62449182648&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2008.39
DO - 10.1109/ICDMW.2008.39
M3 - Conference contribution
AN - SCOPUS:62449182648
SN - 9780769535036
T3 - Proceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008
SP - 949
EP - 958
BT - Proceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008
T2 - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008
Y2 - 15 December 2008 through 19 December 2008
ER -