High Dimensional Indexing

Research topic: Fast Access to Complex Data

Recent applications demand fast query response times on high dimensional data. For this purpose index structures were introduced. Existing multidimensional indexes like the R-tree provide efficient querying for only relatively few dimensions. Therefore we develop new index structures for efficient retrieval and similarity search.

 

Due to massive overlap of index descriptors, multidimensional indexes degenerate for high dimensions and access the entire data by random I/O. Consequently, the efficiency benefits of indexing are lost. By exploiting inherent properties of the indexed data, our new index structures, the TS-Tree and the OF-Tree, can index high-dimensional data in an overlap-free manner; during query processing, powerful pruning via quantized separator and metadata information greatly reduces the number of pages which have to be accessed, resulting in substantial speed-up.

 

Due to the increasing main memory capacity of modern computers, a high percentage of datasets fits into main memory. We develop novel main memory based index structures that use individual dimensions for each data object by applying the method of subspace clustering. By a local selection of dimensions we increase the information content for objects compared to a global approach; this higher information content enables a better pruning of the search space.