Workshop Description
In today's applications, data is collected for multiple analysis tasks.
For any data object, several features or measurements provide a variety
of information in complex and high dimensional databases. In such data,
one
typically observes several valid groupings for each object, i.e.
objects fit in different roles. For example, in customer segmentation,
any customer might show multiple behaviors or properties that suggest
that the customer is part of several distinct clusters based on the
respective aspect considered. In domains such as sensor networks, each
sensor node can be a member of multiple clusters according to different
environmental events. In gene expression analysis, objects should be
detected in multiple clusters due to the various functions of each
gene. In general, multiple groupings are desired by many applications
as they characterize different views of the data. In contrast to these
application demands, traditional clustering techniques detect only a
single grouping and miss the alternative clusterings.
Similarly, the topic of multiple clustering solutions fits into several roles: Both, multiple alternative solutions as well as a single consensus derived out of multiple clusters by ensemble techniques are important perspectives on this research field. Looking at the given information, one observes two perspectives of given views in multi-source clustering in contrast to the detection of novel views by feature selection and space transformation techniques. Further perspectives can be derived by looking at the underlying data: from traditional continuous valued vector spaces up to complex databases (e.g. graphs, sequences, or streams). In all of these areas multiple clustering solutions have opened novel research challenges. Ideas to solve these problems come from a variety of traditional mining paradigms. Frequent itemset mining, ensemble mining, constraint-based mining are only few of the related fields from machine learning and knowledge discovery.
This cross-disciplinary research topic on multiple clustering solutions has received significant attention in recent years. However, since it is relatively young, important research challenges remain. Specifically, we observe an emerging interest in discovering multiple clustering solutions from very high dimensional and complex databases. Detecting alternatives while avoiding redundancy is a key challenge for multiple clustering solutions. Toward this goal, important research issues include how to define redundancy among clusterings, whether existing algorithms can be modified to accommodate this goal, how many solutions should be extracted, how to select among far too many possible solutions, how to evaluate and visualize results, brief, how to most effectively help the data analyst in finding what he or she is looking for. Recent work approaches this problem by looking for non-redundant, alternative, disparate or orthogonal clustering. Research in this area benefits from well-established related areas, such as ensemble clustering, constraint-based clustering, frequent pattern mining, theory on summarization of results, consensus mining and general techniques coping with complex and high dimensional databases.
Multiple clustering solutions provide a new way of looking at the clustering problem. The panel at last year's 1st MultiClust workshop revealed several open research issues in this area. In addition, an overview and a taxonomy of existing approaches in this area can be found in a recent tutorial on discovering multiple clustering solutions. We believe this new area will complement ongoing work in ensemble clustering, constraint-based clustering, subspace clustering, frequent pattern detection and many more.
Similarly, the topic of multiple clustering solutions fits into several roles: Both, multiple alternative solutions as well as a single consensus derived out of multiple clusters by ensemble techniques are important perspectives on this research field. Looking at the given information, one observes two perspectives of given views in multi-source clustering in contrast to the detection of novel views by feature selection and space transformation techniques. Further perspectives can be derived by looking at the underlying data: from traditional continuous valued vector spaces up to complex databases (e.g. graphs, sequences, or streams). In all of these areas multiple clustering solutions have opened novel research challenges. Ideas to solve these problems come from a variety of traditional mining paradigms. Frequent itemset mining, ensemble mining, constraint-based mining are only few of the related fields from machine learning and knowledge discovery.
This cross-disciplinary research topic on multiple clustering solutions has received significant attention in recent years. However, since it is relatively young, important research challenges remain. Specifically, we observe an emerging interest in discovering multiple clustering solutions from very high dimensional and complex databases. Detecting alternatives while avoiding redundancy is a key challenge for multiple clustering solutions. Toward this goal, important research issues include how to define redundancy among clusterings, whether existing algorithms can be modified to accommodate this goal, how many solutions should be extracted, how to select among far too many possible solutions, how to evaluate and visualize results, brief, how to most effectively help the data analyst in finding what he or she is looking for. Recent work approaches this problem by looking for non-redundant, alternative, disparate or orthogonal clustering. Research in this area benefits from well-established related areas, such as ensemble clustering, constraint-based clustering, frequent pattern mining, theory on summarization of results, consensus mining and general techniques coping with complex and high dimensional databases.
Multiple clustering solutions provide a new way of looking at the clustering problem. The panel at last year's 1st MultiClust workshop revealed several open research issues in this area. In addition, an overview and a taxonomy of existing approaches in this area can be found in a recent tutorial on discovering multiple clustering solutions. We believe this new area will complement ongoing work in ensemble clustering, constraint-based clustering, subspace clustering, frequent pattern detection and many more.
Topics of Interest
The workshop covers several aspects of multiple clustering solutions and of related research fields. A non-exhaustive list of topics of interest is given below:- Discovering
multiple clustering solutions
- Alternative clusters / disparate clusters / orthogonal clusters
- Multi-view clustering / subspace clustering / co-clustering
- Multi-source clustering / clustering in parallel universes
- Feature selection and space transformation techniques
- Constraint-based mining for the detection of alternatives
- Non-redundant view detection and non-redundant cluster detection
- Model selection problem: how many clusterings / how many clusters
- Iterative vs. simultaneous processing of multiple views
- Scalability to large and high dimensional databases
- Tackling complex databases (e.g. graphs, sequences, or streams)
- Summarizing
multiple clustering solutions
- Ensemble techniques
- Meta clustering
- Consensus mining
- Summarization and compression theory
- Using and evaluating multiple
clustering solutions
- Classification based on multiple clusterings
- Evaluation metrics for multiple clustering solutions
- Visualization and exploration of multiple clusterings
- Related research fields
- Frequent itemset mining
- Subgroup mining
- Subspace learning
- Relational data mining
- Transfer mining
- Applications of multiple clustering
solutions
- Bioinformatics: gene expression analysis / proteomics / ...
- Sensor network analysis
- Social network analysis
- Health surveillance
- Customer segmentation
- ... and many more ...



