Abstract
The problem examined is declustering, that is, how to distribute a binary Cartesian product file on multiple disks to maximize the parallelism for partial match queries. Cartesian product files appear as a result of some secondary key access methods, such as the multiattribute hashing [12], the grid file [8], etc. For the binary case, the problem is reduced into grouping the 2n binary strings on n bits in m groups of unsimilar strings. The main idea proposed in this paper is to group the strings such that these group forms an error correcting code (ECC). This construction guarantees that the strings of a given group will have large Hamming distances, i.e., they will differ in many bit positions. Intuitively, this should result into good declustering. We briefly mention previous heuristics for declustering, we describe how exactly to build a declustering scheme using an ECC, and we prove a theorem that gives a necessary condition for our method to be optimal. Analytical results show that our method is superior to older heuristics, and that it is very close to the theoretical (nontight) bound. CR Categories: E.1 [Data Structures]; E.4 [Coding and Information Theory]; E.5 [Files]; H.2.2 [Data Base Management]: Physical Design — Access Methods; H.2.6 [Data Base Management]: Database Machines; H.3.2 [Information Storage and Retrieval]: Information Storage; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval.
Original language | English (US) |
---|---|
Pages (from-to) | 907-914 |
Number of pages | 8 |
Journal | IEEE Transactions on Computers |
Volume | 40 |
Issue number | 8 |
DOIs | |
State | Published - Aug 1991 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics
Keywords
- Declustering
- disk allocation
- error correcting codes
- hashing