Sequence Similarity Search
Continuous growth in sensor data and other temporal data increases the importance of retrieval and similarity search in time series data. Analysis of this data typically requires searching for similar time series in the data base and for interactive applications efficiency of the search process is essential.
Existing multidimensional indexes like the R-tree provide efficient querying for only relatively few dimensions. Time series are typically long which corresponds to extremely high dimensional data in multidimensional indexes. Due to massive overlap of index descriptors, multidimensional indexes degenerate for high dimensions and access the entire data by random I/O. Consequently, the efficiency benefits of indexing are lost. Therefore we develop new index structures for efficient time series retrieval and similarity search. For example, by exploiting inherent properties of time series, the developed TS-tree indexes high-dimensional data in an overlap-free manner. During query processing, powerful pruning via quantized separator and metadata information greatly reduces the number of pages which have to be accessed, resulting in substantial speed-up.
Dynamic Time Warping (DTW) is a widely used high quality similarity measure for time series. As DTW is computationally expensive, efficient algorithms for fast DTW computation are crucial. Scalability to long time series, wide DTW bands, and a high number of attributes are still challenging issues. We proposed a novel technique that exploits the inherent properties of multivariate DTW to substantially reduce the number of calculations required to compare a query time series with the time series in a data base in multistep retrieval. The significant efficiency improvements achieved result in substantial performance gains that scale well to long multivariate time series with large DTW bands. Our technique is highly flexible and can be combined with existing indexing structures and DTW filters.