Learning Comprehensible Models for Analysis and Prediction in Scientific Databases
Efficient handling and analysis of experimental measurements is an essential part of research and development in a multitude of disciplines (e.g., engineering, chemistry, biology), since these contain information about the underlying processes. Researchers investigate processes by running experiments and gathering potentially a huge amount of data which is then to be evaluated. For environmental monitoring wireless sensor networks are used to collect data at spatially and temporally discrete positions. In mechanical engineering and related areas, potentially complex test-benches are set up and observations are recorded. Besides an efficient and effective way of exploring multiple results, researchers strive to discover correlations within the measured data. Moreover, model-based prediction of expected measurements can be highly beneficial for designing further experiments.
Typically, analytical functions or distributions are used to model the experimental data. Such models can offer a compact and intuitive representation of the underlying processes. Hence, predictions can be made at operating points for which no measurements were provided. One class of simple yet powerful functions suitable for such models are (piecewise) linear regression functions, which are often used in scientific databases for representing the data and performing prediction queries.
This thesis covers techniques for identifying piecewise linear models by building regression trees. New algorithmic solutions for building more compact and in the same time accurate models are developed and evaluated. Finally, with such models available in scientific databases, novel solutions are introduced, which enable a wide range of reverse engineered model-based predictions.
|Authors:||Zimmer (née Ivanescu) A.|
|Published in:||Dissertation, Fakultät für Mathematik, Informatik und Naturwissenschaften, RWTH Aachen University|
|Publisher:||Apprimus-Verlag - Aachen|
Defence of the PhD thesis: 03.12.2013