On efficient low distortion ultrametric embedding

Vincent Cohen-Addad, C. S. Karthik, Guillaume Lagarde

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

A classic problem in unsupervised learning and data analysis is to find simpler and easy-to-visualize representations of the data that preserve its essential properties. A widely-used method to preserve the underlying hierarchical structure of the data while reducing its complexity is to find an embedding of the data into a tree or an ultrametric, but computing such an embedding on a data set of n points in Ω(log n) dimensions incurs a quite prohibitive running time of Θ(n2). In this paper, we provide a new algorithm which takes as input a set of points P in Rd, and for every c ≥ 1, runs in time n1+ c ρ 2 (for some universal constant ρ > 1) to output an ultrametric ∆ such that for any two points u, v in P, we have ∆(u, v) is within a multiplicative factor of 5c to the distance between u and v in the “best” ultrametric representation of P. Here, the best ultrametric is the ultrametric ∆e that minimizes the maximum distance distortion with respect to the l2 distance, namely that minimizes max ∆e (u,v)/ku−vk2. u,v∈P We complement the above result by showing that under popular complexity theoretic assumptions, for every constant ε > 0, no algorithm with running time n2−ε can distinguish between inputs in l-metric that admit isometric embedding and those that incur a distortion of 3/2. Finally, we present empirical evaluation on classic machine learning datasets and show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.

Original languageEnglish (US)
Title of host publication37th International Conference on Machine Learning, ICML 2020
EditorsHal Daume, Aarti Singh
PublisherInternational Machine Learning Society (IMLS)
Pages2056-2066
Number of pages11
ISBN (Electronic)9781713821120
StatePublished - 2020
Externally publishedYes
Event37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
Duration: Jul 13 2020Jul 18 2020

Publication series

Name37th International Conference on Machine Learning, ICML 2020
VolumePartF168147-3

Conference

Conference37th International Conference on Machine Learning, ICML 2020
CityVirtual, Online
Period7/13/207/18/20

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'On efficient low distortion ultrametric embedding'. Together they form a unique fingerprint.

Cite this