Coding for random projections

Ping Li, Michael Mitzenmacher, Anshumali Shrivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

The method of random projections has become popular for large-scale applications in statistical learning, information retrieval, bio-informatics and other applications. Using a well-designed coding scheme for the projected data, which determines the number of bits needed for each projected value and how to allocate these bits, can significantly improve the effectiveness of the algorithm, in storage cost as well as computational speed. In this paper, we study a number of simple coding schemes, focusing on the task of similarity estimation and on an application to training linear classifiers. We demonstrate that uniform quantization outperforms the standard and influential method (Datar et al., 2004), which used a window-and-random offset scheme. Indeed, we argue that in many cases coding with just a small number of bits suffices. Furthermore, we also develop a non-uniform 2-bit coding scheme that generally performs well in practice, as confirmed by our experiments on training linear support vector machines (SVM). Proofs and additional experiments are available at arXiv:1308.2218. In the context of using coded random projections for approximate near neighbor search by building hash tables (arXiv:1403.8144) (Li et al., 2014), we show that the step of random offset in (Datar et al., 2004) is again not needed and may hurt the performance. Furthermore, we show that, unless the target similarity level is high, it usually suffices to use only 1 or 2 bits to code each hashed value for this task. Section 7 presents some experimental results for LSH.

Original languageEnglish (US)
Title of host publication31st International Conference on Machine Learning, ICML 2014
PublisherInternational Machine Learning Society (IMLS)
Pages2137-2145
Number of pages9
ISBN (Electronic)9781634393973
StatePublished - 2014
Event31st International Conference on Machine Learning, ICML 2014 - Beijing, China
Duration: Jun 21 2014Jun 26 2014

Publication series

Name31st International Conference on Machine Learning, ICML 2014
Volume3

Other

Other31st International Conference on Machine Learning, ICML 2014
Country/TerritoryChina
CityBeijing
Period6/21/146/26/14

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Coding for random projections'. Together they form a unique fingerprint.

Cite this