AN INFORMATION-THEORETIC METRIC BASED METHOD  FOR SELECTING CLUSTERING ATTRIBUTE

Pham Cong Xuyen; Do Si Truong; Nguyen Thanh Tung

doi:10.15625/vap.2016.0005

AN INFORMATION-THEORETIC METRIC BASED METHOD FOR SELECTING CLUSTERING ATTRIBUTE

Pham Cong Xuyen, Do Si Truong, Nguyen Thanh Tung

DOI: 10.15625/vap.2016.0005

Abstract

Clustering problem appears in many different fields like Data Mining, Pattern Recognition, Bioinfor-matics, etc. The basic objective of clustering is to group objects into clusters so that objects in the same cluster are more similar to one another than they are to objects in other clusters. Recently, many researchers have contributed to categorical data clustering, where data objects are made up of non-numerical attributes. Especially, rough set theory based attribute selection clustering approaches for categorical data have attracted much attention. The key to these approaches is how to select only one attribute that is the best to cluster the objects at each time from many candidates of attributes.
In this paper, we review three rough set based techniques: Total Roughness (TR), Min-Min Roughness (MMR) and Maximum Dependency Attribute (MDA), and propose MAMD (Minimum value of Average Mantaras Distance), an alternative algorithm for hierarchical clustering attribute selection. MAMD uses Mantaras metric which is an information-theoretic metric on the set of partitions of a ﬁnite set of objects and seeks to determine a clustering attribute such that the average distance between the partition generated by this attribute and the partitions generated by other attributes of the objects has a minimum value. To evaluate and compare MAMD with three rough set based techniques, we use the concept of average intra-class similarity to measure the clustering quality of selected attribute. The experiment results show that the clustering quality of the attribute selected by our method is higher than that of attributes selected by TR, MMR and MDA methods.

Keywords

Data Mining, Hierarchical clustering, Categorical data, Rough sets, Clustering attribute selection

Full Text:

PDF

PROCEEDING

PUBLISHING HOUSE FOR SCIENCE AND TECHNOLOGY

Website: http://vap.ac.vn

Contact: nxb@vap.ac.vn

Username
Password
Remember me