Adaptable Similarity Search in 3-D Spatial Database Systems
Similarity search in database systems is becoming an increasingly important task inmodern application domains such as multimedia, CAD, molecular biology, medical imagingand many others. By investigating novel adaptable similarity models and newalgorithms for efficient similarity query processing a fundamental requirement of similaritysearch in very large database systems is targeted. Particular attention is paid to thebasic observation that similarity highly depends upon the requirements of specific applicationsand on the changing needs of individual users.After some preliminary work concerning the subject of similarity search in spatialdatabase systems and a survey of the 3-D protein database system which is covered inpart I, several adaptable similarity models are presented in part II. Firstly, the fundamentalnotion of quadratic form distance functions, d_A^2(x, y) = (x-y) * A * (x-y)^T, is introduced.By providing appropriate feature transforms such as the computation of histogramsand by specifying and modifying similarity matrices A, the basic model is shownto support particular adaptability for a wide variety of applications. The potentials of thismulti-purpose approach are demonstrated through several examples including colororientatedsimilarity of image, shape-orientated similarity of images, histogram-basedsimilarity of 3-D shapes and approximation-based shape similarity of 3-D surface segments.In addition to the availability of adaptable similarity models, the aspect of efficientquery processing becomes ever more important due to the increasing sizes of large andvary large databases. Part III is dedicated to the topic of efficient support for adaptablesimilarity query processing and begins with the presentation of a novel algorithm foroptimal multi-step nearest-neighbor search. The multi-purpose quadratic form similaritydistance functions represent ellipsoid queries which are a new query type for spatialdatabase systems. New algorithms are introduced for efficiently processing ellipsoidqueries on multidimensional index structures and particular care is taken for the adaptabilityof the similarity model by the user. Specifically, this algorithm supports modificationsof the similarity matrix (i.e: the query ellipsoid) at query time. After investigatingseveral approximation techniques for ellipsoid queries a multi-step technique forefficient ellipsoid query processing in even high-dimensional spaces is presented thatapplies existing as well as new techniques for the reduction of dimensionality to queryellipsoids. The reduced similarity query is again an ellipsoid query and guarantees nofalse drops in the multi-step query processor. In particular, the reduced ellipsoid queryrepresents the greatest of all lower-bounding filter distance functions thus ensuring theoptimal filter selectivity. Extensive experiments on an image database containing112,000 color images and 10,000 grayscale images as well as on the 3-D protein databasesystem containing 5,000 molecules with 94,000 surface segments demonstrate theeffectiveness and high efficiency of these new concepts in various dimensions from 5-Dup to 4,096-D.
|Published in:||Dissertation (Ph.D. thesis), Faculty of Mathematics and Computer Science, University of Munich. Herbert Utz Verlag, Munich|
|Forschungsgebiet:||Exploration of Multimedia Databases|