Efficient Similarity Search for Hierarchical Data in Large Databases

Structured and semi-structured object representations are getting more and more important for modern database applications. Examples for such data are hierarchical structures including chemical compounds, XML data or image data. As a key feature, database systems have to support the search for similar objects where it is important to take into account both the structure and the content features of the objects. A successful approach is to use the edit distance for tree structured data. As the computation of this measure is NP-complete, constrained edit distances have been successfully applied to trees. While yielding good results, they are still computationally complex and, therefore, of limited benefit for searching in large databases. In this paper, we propose a filter and refinement architecture to overcome this problem. We present a set of new filter methods for structural and for content-based information in tree-structured data as well as ways to flexibly combine different filter criteria. The efficiency of our methods, resulting from the good selectivity of the filters is demonstrated in extensive experiments with real-world applications.

Authors: Kailing K., Kriegel H.-P., Schönauer S., Seidl T.
Published in: Proc. 9th Int. Conf. on Extending Data Base Technology (EDBT 2004), Heraklion - Crete, Greece. Springer LNCS 2992
Publisher: Springer - Heidelberg,Germany
Sprache: EN
Jahr: 2004

(acceptance rate 14%)

Seiten: 676-693
ISBN: 3-540-21200-0
Konferenz: EDBT
URL:conference homepage
Typ: Tagungsbeiträge
Forschungsgebiet: Fast Access to Complex Data