The following information was submitted:
Transactions: WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS
Transactions ID Number: 89-623
Full Name: Mohammed Mazid
Position: Ph.D. Candidate
Age: ON
Sex: Male
Address: 1/46 Pillich Street, Kawana, QLD 4701
Country: AUSTRALIA
Tel: 61-431111043
Tel prefix:
Fax:
E-mail address: m.mazid@cqu.edu.au
Other E-mails: mizanul7@hotmail.com, mizanul71@yahoo.com
Title of the Paper: Input space reduction for rule based classification
Authors as they appear in the Paper: Mohammed M Mazid, A B M Shawkat Ali, Kevin S Tickle
Email addresses of all the authors: m.mazid@cqu.edu.au, s.ali@cqu.edu.au, k.tickle@cqu.edu.au
Number of paper pages: 11
Abstract: Rule based classification is one of the most popular way of classification in data mining. There are number of algorithms for rule based classification. C4.5 and Partial Decision Tree (PART) are very popular algorithms among them and both have many empirical features such as continuous number categorization, missing value handling, etc. However in many cases these algorithms takes more processing time and provides less accuracy rate for correctly classified instances. One of the main reasons is high dimensionality of the databases. A large dataset might contain hundreds of attributes with huge instances. We need to choose most related attributes among them to obtain higher accuracy. It is also a difficult task to choose a proper algorithm to perform efficient and perfect classification. With our proposed method, we select the most relevant attributes from a dataset by reducing input space and simultaneously improve the performance of these two rule based algorithm!
s. The improved performance is measured based on better accuracy and less computational complexity. We measure Entropy of Information Theory to identify the central attribute for a dataset. Then apply correlation coefficient measure namely, Pearson's, Spearman and Kendall correlation utilizing the central attribute of the same dataset. We have conducted a comparative study using these three most popular correlation coefficient measures to choose the best method. We have picked datasets from well known data repository UCI (University of California Irvine) database. We have used box plot to compare experimental results. Our proposed method has showed better performance in most of the individual experiment.
Keywords: Classification, C4.5, PART, Entropy, Pearson's correlation, Spearman correlation, Kendall correlation
EXTENSION of the file: .pdf
Special (Invited) Session: Improved C4.5 algorithm for rule based classification
Organizer of the Session: 640-811
How Did you learn about congress:
IP ADDRESS: 138.77.2.133