HECURA: COLABORATIVE: MULTIDIMENSIONAL AND STRING INDEXES FOR STREAMING DATA

Project Details

Description

This research project aims to understand and develop systems formaintaining superlinear indexes for streaming data. A superlinearindex provides search capability over an abstract space that cannoteasily be linearized (totally ordered). In contrast, a linear index,typified by a B-tree, supports point and range queries on totallyordered data.Examples of superlinear indexes include multidimensional indexes,which can be over a geometric domain, such as geographic data, orwhich can be over multiple linear indexes; and full text queries,which can include searching for a particular word or substring.The superlinear indexes found in today's databases cannot support highrates of insertion. On traditional mechanical disk drives, theexisting superlinear indexes can only support about one hundredinsertions per second in the worst case. For many importantapplications, that is too slow, and so database users often avoidsuperlinear indexing. Even traditional linear indexes based onB-trees cannot support the high insertion rates demanded by manydatabases.This research investigates streaming superlinear indexes, that is,indexes that efficiently support full text or multidimensionalqueries, and can be updated at speeds that are related to diskbandwidth rather than seeks per second.Among the significant research issues are the following: (1) designefficient files structures for streaming superlinear indexes; (2)investigate how streaming superlinear indexes might pave the way toimproved file systems; (3) determine whether cache-obliviousalgorithms technology can enhance streaming superlinear indexes; and(4) program complex data structures for transactions and recovery.If successful, this research will show how to build filesystems thatachieve dramatically better performance than today's B-tree-basedfilesystems, how to maintain rich geometrical data andmultidimensional nongeographical databases in real time, and how tomaintain full-text searchable databases in real time. For example,some of today's file systems try to maintain an full-text index tofind strings in files quickly, but these systems often fall behind athigh data write rates. A streaming superlinear index would allow sucha file system to keep up, and would improve the usability of bothhigh-end storage systems and relatively small consumer storage systemsthat are nonetheless too large to index with today's indexes.The researchers are developing course materials on streaming indexingtechnology which will be made freely available under the MITOpenCourseWare initiative (http://ocw.mit.edu).Further information on this project may be found at the projectweb page: http://supertech.csail.mit.edu/superlinear-indexes
StatusFinished
Effective start/end date9/1/098/31/12

Funding

  • National Science Foundation (National Science Foundation (NSF))

Fingerprint Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.