Indexing Density Models for Incremental Learning and Anytime Classification on Data Streams

Classification of streaming data faces three basic challenges: it has to deal with huge amounts of data, the varying time between two stream data items must be used best possible (anytime classification) and additional training data must be incrementally learned (anytime learning) for applying the classifier consistently to fast data streams. In this work, we propose a novel index-based technique that can handle all three of the above challenges using the established Bayes classifier on effective kernel density estimators. Our novel Bayes tree automatically generates (adapted efficiently to the individual object to be classified) a hierarchy of mixture densities that represent kernel density estimators at successively coarser levels. Our probability density queries together with novel classification improvement strategies provide the necessary information for very effective classification at any point of interruption. Moreover, we propose a novel evaluation method for anytime classification using Poisson streams and demonstrate the anytime learning performance of the Bayes tree.

Authors: Seidl T., Assent I., Kranen P., Krieger R., Herrmann J.
Published in: Proc. 12th International Conference on Extending Database Technology (EDBT/ICDT 2009), Saint-Petersburg, Russia.
Publisher: ACM - New York, NY, USA
Sprache: EN
Jahr: 2009
Seiten: 311-322
ISBN: 978-1-60558-422-5
Konferenz: EDBT
DOI: 10.1145/1516360.1516397
Typ: Tagungsbeiträge
Forschungsgebiet: Data Analysis and Knowledge Extraction