Theory of the GMM Kernel

Ping Li, Cun Hui Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Scopus citations

Abstract

In web search, data mining, and machine learning, two popular measures of data similarity are the cosine and the resemblance (the latter is for binary data). In this study, we develop theoretical results for both the cosine and the GMM (generalized min-max) kernel [26], which is a generalization of the resemblance. GMM has direct applications in machine learning as a positive definite kernel and can be efficiently linearized via probabilistic hashing to handle big data. Owing to its discrete nature, the hashed values can also be used to build hash tables for efficient near neighbor search. We prove the theoretical limit of GMM and the consistency result, assuming that the data follow an elliptical distribution, which is a general family of distributions and includes the multivariate normal and t-distribution as special cases. The consistency result holds as long as the data have bounded first moment (an assumption which typically holds for data commonly encountered in practice). Furthermore, we establish the asymptotic normality of GMM. We also prove the limit of cosine under elliptical distributions. In comparison, the consistency of GMM requires much weaker conditions. For example, when data follow a t-distribution with ν degrees of freedom, GMM typically provides a better estimate of similarity than cosine when ν < 8 (ν = 8 means the distribution is very close to normal). These theoretical results help explain the recent success of GMM and lay the foundation for further research.

Original languageEnglish (US)
Title of host publication26th International World Wide Web Conference, WWW 2017
PublisherInternational World Wide Web Conferences Steering Committee
Pages1053-1062
Number of pages10
ISBN (Print)9781450349130
DOIs
StatePublished - 2017
Event26th International World Wide Web Conference, WWW 2017 - Perth, Australia
Duration: Apr 3 2017Apr 7 2017

Publication series

Name26th International World Wide Web Conference, WWW 2017

Other

Other26th International World Wide Web Conference, WWW 2017
Country/TerritoryAustralia
CityPerth
Period4/3/174/7/17

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Theory of the GMM Kernel'. Together they form a unique fingerprint.

Cite this