|Abstract:||The ability to extract information from collected data has always driven science. Today.s large computers and automated sensing technologies collect terabytes of data in a few weeks. Extracting information from such large amounts of data is like trying to find a needle in a haystack. For efficient information extraction, we need disk-based indexing schemes that can efficiently handle queries restricting ranges on dozens of attributes. Unfortunately, the unique characteristics of scientific data and queries cause traditional indexing techniques to have poor performance on scientific workloads, occupy excessive space, or both.
Bitmap indexes were proposed as a solution to these problems. However, in experiments with scientific data and queries, we found that previously proposed variations of bitmap indexes either were quite slow or required excessive storage for processing the large-range query conditions our scientists used. Scientists also told us that bitmap indexes, though smaller than traditional indexes, were too large for scientific data warehouses. Our scientists also wanted an efficient method to consolidate the data points returned by the indexes into larger, more meaningful regions of interest.
To address these three problems, we introduced multi-resolution bitmap indexes, which group data into bins at multiple granularities. We achieved a query performance which is 10 times faster than traditional bitmap indexes by using bitmap indexes built at these multiple granularities. To address the issue of size, we introduced an adaptive version of multi-resolution bitmap indexes. The adaptive index adds and drops auxiliary indexes as needed for the query workload and is a fraction of the size of the data being indexed. We achieved a performance improvement of a factor of 6, compared to an ordinary multi-resolution bitmap index of the same size. We also introduced a novel algorithm to consolidate data points into regions of interest. By exploiting the special properties of compressed bitmap indexes and scientific meshes we achieved sublinear running times, with respect to the number of points in the query result, for both the index lookup and region consolidation.