Volume 8 Issue 2 - March 2017

  • 1. Cluster ensembling - a technical review

    Authors : Tanushree Bhimanwar, Gauri Chaudhary

    Pages : 24-28

    DOI : http://dx.doi.org/10.21172/1.82.004

    Keywords : Clusteringcategorical datacluster ensemblesdata mining similarity measures.

    Abstract :

    In Data mining, Clustering is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., Euclidean) similarity measure in order to partition the database such that data points belonging to same partition are more similar than points in different partitions. When the data is categorical, Clustering becomes more challenging problem, that is, when there is no inherent distance measure between data values. Various clustering algorithms are developed to cluster or categorize the data-sets. Some algorithms cannot be directly applied for clustering of categorical data. The underlying ensemble information matrix presents only cluster data point relations, with many entries being left unknown. This paper presents an analysis that shows this problem degrades the quality of the clustering result, and it presents a new link-based approach, which improves the conventional matrix through similarity between clusters in an ensemble, by discovering unknown entries. In particular, an efficient link-based algorithm is used to measure the underlying similarity assessment. Hence, to obtain the final clustering result, a graph partitioning technique is applied to a weighted bipartite graph that is obtained from the refined matrix.

    Citing this Journal Article :

    Tanushree Bhimanwar, Gauri Chaudhary, "Cluster ensembling - a technical review", Volume 8 Issue 2 - March 2017, 24-28