Efficient parallel partition-based algorithms for similarity search and join with edit distance constraints

Yu Jiang, Dong Deng, Jiannan Wang, Guoliang Li, Jianhua Feng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

The quantity of data in real-world applications is growing significantly while the data quality is still a big problem. Similarity search and similarity join are two important operations to address the poor data quality problem. Although many similarity search and join algorithms have been proposed, they did not utilize the abilities of modern hardware with multi-core processors. It calls for new parallel algorithms to enable multi-core processors to meet the high performance requirement of similarity search and join on big data. To this end, in this paper we propose parallel algorithms to support efficient similarity search and join with edit-distance constraints. We adopt the partition-based framework and extend it to support parallel similarity search and join on multi-core processors. We also develop two novel pruning techniques. We have implemented our algorithms and the experimental results on two real datasets show that our parallel algorithms achieve high performance and obtain good speedup.

Original languageEnglish (US)
Title of host publicationJoint EDBT/ICDT 2013 Workshops - Proceedings
Pages341-348
Number of pages8
DOIs
StatePublished - May 2 2013
Externally publishedYes
EventJoint EDBT/ICDT 2013 Workshops - Genoa, Italy
Duration: Mar 18 2013Mar 22 2013

Publication series

NameACM International Conference Proceeding Series

Conference

ConferenceJoint EDBT/ICDT 2013 Workshops
CountryItaly
CityGenoa
Period3/18/133/22/13

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Keywords

  • content filter
  • parallel algorithms
  • similarity join
  • similarity search

Fingerprint Dive into the research topics of 'Efficient parallel partition-based algorithms for similarity search and join with edit distance constraints'. Together they form a unique fingerprint.

Cite this