Workshop Description

In today's applications, data is collected for multiple analysis tasks. For any data object, several features or measurements provide a variety of information in complex and high dimensional databases. In such data, one typically observes several valid groupings for each object, i.e. objects fit in different roles. For example, in customer segmentation, any customer might show multiple behaviors or properties that suggest that the customer is part of several distinct clusters based on the respective aspect considered. In domains such as sensor networks, each sensor node can be a member of multiple clusters according to different environmental events. In gene expression analysis, objects should be detected in multiple clusters due to the various functions of each gene. In general, multiple groupings are desired by many applications as they characterize different views of the data. In contrast to these application demands, traditional clustering techniques detect only a single grouping and miss the alternative clusterings.

Similarly, the topic of multiple clustering solutions fits into several roles: Both, multiple alternative solutions as well as a single consensus derived out of multiple clusters by ensemble techniques are important perspectives on this research field. Looking at the given information, one observes two perspectives of given views in multi-source clustering in contrast to the detection of novel views by feature selection and space transformation techniques. Further perspectives can be derived by looking at the underlying data: from traditional continuous valued vector spaces up to complex databases (e.g. graphs, sequences, or streams). In all of these areas multiple clustering solutions have opened novel research challenges. Ideas to solve these problems come from a variety of traditional mining paradigms. Frequent itemset mining, ensemble mining, constraint-based mining are only few of the related fields from machine learning and knowledge discovery.

This cross-disciplinary research topic on multiple clustering solutions has received significant attention in recent years. However, since it is relatively young, important research challenges remain. Specifically, we observe an emerging interest in discovering multiple clustering solutions from very high dimensional and complex databases. Detecting alternatives while avoiding redundancy is a key challenge for multiple clustering solutions. Toward this goal, important research issues include how to define redundancy among clusterings, whether existing algorithms can be modified to accommodate this goal, how many solutions should be extracted, how to select among far too many possible solutions, how to evaluate and visualize results, brief, how to most effectively help the data analyst in finding what he or she is looking for. Recent work approaches this problem by looking for non-redundant, alternative, disparate or orthogonal clustering. Research in this area benefits from well-established related areas, such as ensemble clustering, constraint-based clustering, frequent pattern mining, theory on summarization of results, consensus mining and general techniques coping with complex and high dimensional databases.

Multiple clustering solutions provide a new way of looking at the clustering problem. The panel at last year's 1st MultiClust workshop revealed several open research issues in this area. In addition, an overview and a taxonomy of existing approaches in this area can be found in a recent tutorial on discovering multiple clustering solutions. We believe this new area will complement ongoing work in ensemble clustering, constraint-based clustering, subspace clustering, frequent pattern detection and many more.


Topics of Interest

The workshop covers several aspects of multiple clustering solutions and of related research fields. A non-exhaustive list of topics of interest is given below:

  • Discovering multiple clustering solutions
    • Alternative clusters / disparate clusters / orthogonal clusters
    • Multi-view clustering / subspace clustering / co-clustering
    • Multi-source clustering / clustering in parallel universes
    • Feature selection and space transformation techniques
    • Constraint-based mining for the detection of alternatives
    • Non-redundant view detection and non-redundant cluster detection
    • Model selection problem: how many clusterings / how many clusters
    • Iterative vs. simultaneous processing of multiple views
    • Scalability to large and high dimensional databases
    • Tackling complex databases (e.g. graphs, sequences, or streams)
  • Summarizing multiple clustering solutions
    • Ensemble techniques
    • Meta clustering
    • Consensus mining
    • Summarization and compression theory
  • Using and evaluating multiple clustering solutions
    • Classification based on multiple clusterings
    • Evaluation metrics for multiple clustering solutions
    • Visualization and exploration of multiple clusterings
  • Related research fields
    • Frequent itemset mining
    • Subgroup mining
    • Subspace learning
    • Relational data mining
    • Transfer mining
  • Applications of multiple clustering solutions
    • Bioinformatics: gene expression analysis / proteomics / ...
    • Sensor network analysis
    • Social network analysis
    • Health surveillance
    • Customer segmentation
    • ... and many more ...
We encourage submissions describing innovative work in related fields that address the issue of multiplicity in data mining.

List of invited talks

Michael Houle (National Institute of Informatics, Japan)

Title "Combinatorial Approaches to Clustering and Feature Selection"

Abstract:
One of the most serious difficulties in the analysis of high-dimensional data sets involves the treatment of measures of similarity. Although similarity measures often retain some discriminative ability as the dimension increases, the similarity values themselves are often difficult to interpret. Methods for search, clustering and feature selection that perform quantitive tests of similarity values (as opposed to comparative tests) are particularly susceptible to this problem. This presentation will be concerned with combinatorial models of clustering based on shared neighbor information, and their application to feature selection, subspace clustering, and multiple clustering. The models assume a secondary, derived form of similarity measure based on the intersection properties of neighborhoods defined according to the original similarity measure. The use of secondary similarity has been recently shown to offer solutions that are more robust and more scalable with respect to the dimension of the data.

Short Bio:
Michael Houle obtained his PhD degree from McGill University in 1989, in the area of computational geometry. Since then, he developed research interests in algorithmics, data structures, and relational visualization, first as a research associate at Kyushu University and the University of Tokyo in Japan, and from 1992 at the University of Newcastle and the University of Sydney in Australia. From 2001 to 2004, he was a Visiting Scientist at IBM Japan's Tokyo Research Laboratory, where he first began working on approximate similarity search and shared-neighbor clustering methods for data mining applications. Currently, he is a Visiting Professor at the National Institute of Informatics (NII), Japan. He received the IEEE ICDM 2010 Best Research Paper Award for his work on the intrinsic dimensional analysis of local outlier detection.

Bart Goethals (University of Antwerp, Belgium)

Title "Cartification: from Similarities to Itemset Frequencies"

Abstract:
Suppose we are given a multi-dimensional dataset. For every point in the dataset, we create a transaction, or cart, in which we store the k-nearest neighbors of that point for one of the given dimensions. The resulting collection of carts can then be used to mine frequent itemsets; that is, sets of points that are frequently seen together in some dimensions. Essentially, this transformation, which we call cartification, combines multiple distance measures without suffering from the curse of dimensionality. Moreover, experimentation shows that finding a good clustering, outliers, cluster centers, or even subspace clustering becomes easy on the cartified dataset using state-of-the-art techniques in mining interesting itemsets.

Short Bio:
Bart Goethals is professor at the Department of Mathematics and Computer Science of the University of Antwerp in Belgium. He leads the Data Mining lab of the Advanced Database Research and Modeling (ADReM) research group, which performs fundamental research on the structures, the basic properties and the power of languages, algorithms and methodologies for processing and analysing large quantities of data. His primary research interests are the study of data mining techniques to efficiently find interesting patterns and properties in large databases. He received the IEEE ICDM 2001 Best Paper Award and the PKDD 2002 Best Paper Award for his theoretical studies on frequent itemset mining. He acted as organizer and program chair of several leading workshops and conferences in the field, such as ECML PKDD 2008, and SIAM DM 2010. He is currently general chair of the ECML PKDD Steering Committee, associate editor of the Data Mining and Knowledge Discovery journal, the Knowledge and Information Systems journal and editor in chief of the ACM SIGKDD Explorations newsletter.

Submission Instructions

We invite submission of unpublished original research papers that are not under review elsewhere. All papers will be peer reviewed. If accepted, at least one of the authors must attend the workshop to present the work. The submitted papers must be written in English and formatted according to the Springer-Verlag Lecture Notes in Artificial Intelligence guidelines. Author's instructions and style files can be downloaded at: http://www.springer.de/comp/lncs/authors.html

The maximum length of papers is 12 pages in this format. We also invite vision papers and descriptions of work-in-progress or case studies on benchmark data as short paper submissions of up to 6 pages.

The papers should be in PDF format and submitted via EasyChair submission site

If you are considering submitting to the workshop and have questions regarding the workshop scope or need further information, please do not hesitate to contact the PC chairs.

Proceedings and Awards

We will edit on-line proceedings of all accepted papers so that the results are widely accessible. Proceedings will be published through the CEUR Workshop Proceedings (CEUR-WS.org) publication service in time for the workshop. If there is sufficient interest and quality of papers, we will also consider a post-workshop publication (e.g., as a special issue in a journal).

Among the accepted papers, a Best-Paper-Award carrying the value of 300 EURO will be granted to innovative contributions in the new field of multiple clusterings.

Download complete proceedings:
2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings

Important Dates

Submission deadline: June 14, 2011
Acceptance notification: July 1, 2011
Camera-ready deadline: July 21, 2011
Workshop: September 5, 2011

Workshop Program

Invited Talk (09:00 - 09:30)


Technical Talks (09:30 - 10:30)


Coffee Break (10:30 - 11:00)



Invited Talk (11:00 - 11:30)


Technical Talks (11:30 - 12:15)


Discussion Panel (12:15 - 12:30)

Organizers

Program Committee

  • James Bailey (University of Melbourne, Australia)
  • Carlotta Domeniconi (George Mason University, USA)
  • Ines Färber (RWTH Aachen University, Germany)
  • Vivekanand Gopalkrishnan (Nanyang Technological University, Singapore)
  • Dimitrios Gunopulos (University of Athens, Greece)
  • Michael Houle (National Institute of Informatics, Japan)
  • Daniel Keim (University of Konstanz, Germany)
  • Themis Palpanas (University of Trento, Italy)
  • Magda Procopiuc (AT&T Research, USA)
  • Naren Ramakrishnan (Virginia Tech, USA)
  • Jörg Sander (University of Alberta, Canada)
  • Alexander Topchy (Nielsen Media Research)
  • Lyle H. Ungar (University of Pennsylvania, USA)
  • Jilles Vreeken (University of Antwerp, Belgium)
  • Wei Wang (University of North Carolina at Chapel Hill, USA)
  • Arthur Zimek (University of Munich, Germany)
Design downloaded from free website templates.