Subspace Clustering for Indexing High Dimensional Data: A Main Memory Index based on Local Reductions and Individual Multi-Representations

Fast similarity search in high dimensional feature spaces is crucial in today's applications. Since the performance of traditional index structures degrades with increasing dimensionality, concepts were developed to cope with this curse of dimensionality. Most of the existing concepts exploit global correlations between dimensions to reduce the dimensionality of the feature space. In high dimensional data, however, correlations are often locally constrained to a subset of the data and every object can participate in several of these correlations. Accordingly, discarding the same set of dimensions for each object based on global correlations and ignoring the di erent correlations of single objects leads to signi cant loss of information. These aspects are relevant due to the direct correspondence between the degree of information preserved and the achievable query performance.


We introduce a novel main memory index structure with increased information content for each single object compared to a global approach. This is achieved by using individual
dimensions for each data object by applying the method of subspace clustering. The structure of our index is based on a multi-representation of objects rejecting their multiple correlations; that is, besides the general increase of information per object, we provide several individual representations for each single data object. These multiple views correspond to diff erent local reductions per object and enable more e ffective pruning. In thorough experiments on real and synthetic data, we demonstrate that our novel solution achieves low query times and outperforms existing approaches designed for high dimensional data.

Authors: Günnemann S., Kremer H., Lenhard D., Seidl T.
Published in: Proc. International Conference on Extending Database Technology (EDBT/ICDT 2011), Uppsala, Sweden 
Publisher: ACM - New York,NY, USA
Language: EN
Year: 2011
Pages: 237-248
ISBN: 978-1-4503-0528-0
Conference: EDBT
Url:EDBT 2011
Type: Conference papers (peer reviewed)
Research topic: Fast Access to Complex Data