Monday, 1 August 2011

Wseas Transactions

New Subscription to Wseas Transactions

The following information was submitted:

Transactions: INTERNATIONAL JOURNAL of MATHEMATICS AND COMPUTERS IN SIMULATION
Transactions ID Number: 17-213
Full Name: Nittaya Kerdprasop
Position: Associate Professor
Age: ON
Sex: Female
Address: Suranaree University of Technology, School of Computer Engineering
Country: THAILAND
Tel:
Tel prefix:
Fax:
E-mail address: nittaya@sut.ac.th
Other E-mails: nittaya.k@gmail.com
Title of the Paper: The Development of Discrete Decision Tree Induction for Categorical Data
Authors as they appear in the Paper: Nittaya Kerdprasop and Kittisak Kerdprasop
Email addresses of all the authors: nittaya@sut.ac.th, kittisakThailand@gmail.com
Number of paper pages: 11
Abstract: In decision analysis, decision trees are commonly used as a visual support tool for identifying the best strategy that is most likely to reach a desired goal. A decision tree is a hierarchical structure normally represented as a tree-like graph model. The tree consists of decision nodes, splitting paths based on the values of a decision node, and sink nodes representing final decisions. In data mining and machine learning, decision tree induction is one of the most popular classification algorithms. The popularity of decision tree induction over other data mining techniques are its simple structure, ease of comprehension, and the ability to handle both numerical and categorical data. For numerical data with continuous values, the tree building algorithm simply compares the values to some constant. If the attribute has value smaller than or equal to the constant, then proceeds to the left branch; otherwise, takes the right branch. Tree branching process is much mor!
e complex on categorical data. The algorithm has to calculate the optimal branching decision based on the proportion of each individual value of categorical attribute to the target attribute. A categorical attribute with a lot of distinct values can lead to the overfitting problem. Overfitting occurs when a model is overly complex from the attempt to describe too many small samples which are the results categorical attributes with large quantities. A model that overfits the training data has poor predictive performance on unseen test data. We thus propose novel techniques based on data grouping and heuristic-based selection to deal with overfitting problem on categorical data. Our intuition is on the basis of appropriate selection of data samples to remove random error or noise before building the model. Heuristics play their role on pruning strategy during the model building phase. The implementation of our proposed method is based on the logic programming paradigm and som!
e major functions are presented in the paper. We observe from the expe
rimental results that our techniques work well on high dimensional categorical data in which attributes contain distinct values less than ten. For large quantities of categorical values, discretization technique is necessary.
Keywords: Overfitting problem, Categorical data, Data mining, Decision tree induction, Prolog language
EXTENSION of the file: .pdf
Special (Invited) Session: Discrete Decision Tree Induction to Avoid Overfitting on Categorical Data
Organizer of the Session: 658-388
How Did you learn about congress:
IP ADDRESS: 203.158.4.227