AN INFORMATION-THEORETIC METRIC BASED METHOD FOR SELECTING CLUSTERING ATTRIBUTE
DOI: 10.15625/vap.2016.0005
Abstract
Clustering problem appears in many different fields like Data Mining, Pattern Recognition, Bioinfor-matics, etc. The basic objective of clustering is to group objects into clusters so that objects in the same cluster are more similar to one another than they are to objects in other clusters. Recently, many researchers have contributed to categorical data clustering, where data objects are made up of non-numerical attributes. Especially, rough set theory based attribute selection clustering approaches for categorical data have attracted much attention. The key to these approaches is how to select only one attribute that is the best to cluster the objects at each time from many candidates of attributes.
In this paper, we review three rough set based techniques: Total Roughness (TR), Min-Min Roughness (MMR) and Maximum Dependency Attribute (MDA), and propose MAMD (Minimum value of Average Mantaras Distance), an alternative algorithm for hierarchical clustering attribute selection. MAMD uses Mantaras metric which is an information-theoretic metric on the set of partitions of a finite set of objects and seeks to determine a clustering attribute such that the average distance between the partition generated by this attribute and the partitions generated by other attributes of the objects has a minimum value. To evaluate and compare MAMD with three rough set based techniques, we use the concept of average intra-class similarity to measure the clustering quality of selected attribute. The experiment results show that the clustering quality of the attribute selected by our method is higher than that of attributes selected by TR, MMR and MDA methods.
In this paper, we review three rough set based techniques: Total Roughness (TR), Min-Min Roughness (MMR) and Maximum Dependency Attribute (MDA), and propose MAMD (Minimum value of Average Mantaras Distance), an alternative algorithm for hierarchical clustering attribute selection. MAMD uses Mantaras metric which is an information-theoretic metric on the set of partitions of a finite set of objects and seeks to determine a clustering attribute such that the average distance between the partition generated by this attribute and the partitions generated by other attributes of the objects has a minimum value. To evaluate and compare MAMD with three rough set based techniques, we use the concept of average intra-class similarity to measure the clustering quality of selected attribute. The experiment results show that the clustering quality of the attribute selected by our method is higher than that of attributes selected by TR, MMR and MDA methods.
Keywords
Data Mining, Hierarchical clustering, Categorical data, Rough sets, Clustering attribute selection
Full Text:
PDFCopyright (c) 2017 PROCEEDING of Publishing House for Science and Technology
PROCEEDING
PUBLISHING HOUSE FOR SCIENCE AND TECHNOLOGY
Website: http://vap.ac.vn
Contact: nxb@vap.ac.vn