Colibri: Fast mining of large static and dynamic graphs

Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

94 Scopus citations

Abstract

Low-rank approximations of the adjacency matrix of a graph are essential in finding patterns (such as communities) and detecting anomalies. Additionally, it is desirable to track the low-rank structure as the graph evolves over time, efficiently and within limited storage. Real graphs typically have thousands or millions of nodes, but are usually very sparse. However, standard decompositions such as SVD do not preserve sparsity. This has led to the development of methods such as CUR and CMD, which seek a non-orthogonal basis by sampling the columns and/or rows of the sparse matrix. However, these approaches will typically produce overcomplete bases, which wastes both space and time. In this paper we propose the family of Colibri methods to deal with these challenges. Our version for static graphs, Colibri-S, iteratively finds a non-redundant basis and we prove that it has no loss of accuracy compared to the best competitors (CUR and CMD), while achieving significant savings in space and time: on real data, Colibri-S requires much less space and is orders of magnitude faster (in proportion to the square of the number of non-redundant columns). Additionally, we propose an efficient update algorithm for dynamic, time-evolving graphs, Colibri-D. Our evaluation on a large, real network traffic dataset shows that Colibri-D is over 100 times faster than the best published competitor (CMD).

Original languageEnglish (US)
Title of host publicationKDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining
Pages686-694
Number of pages9
DOIs
StatePublished - 2008
Externally publishedYes
Event14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 - Las Vegas, NV, United States
Duration: Aug 24 2008Aug 27 2008

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008
Country/TerritoryUnited States
CityLas Vegas, NV
Period8/24/088/27/08

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Keywords

  • Graph mining
  • Low-rank approximation
  • Scalability

Fingerprint

Dive into the research topics of 'Colibri: Fast mining of large static and dynamic graphs'. Together they form a unique fingerprint.

Cite this