The following information was submitted:
Transactions: WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS
Transactions ID Number: 53-898
Full Name: KunHong Liu
Position: Lecturer
Age: ON
Sex: Male
Address: School of Software,Xiamen University,Xiamen, Fujian Province, P. R. China, 361005
Country: CHINA
Tel:
Tel prefix:
Fax:
E-mail address: lkhqz@163.com
Other E-mails:
Title of the Paper: Mixed-sampling Approach to Unbalanced Data Distributions:A Case Study involving Leukemia¡¯s Document Profiling
Authors as they appear in the Paper: Wu QingQiang;Liu Hua;Liu KunHong
Email addresses of all the authors: wuqq@xmu.edu.cn;liuhua@mail.las.ac.cn;lkhqz@xmu.edu.cn
Number of paper pages: 23
Abstract: Leukemia¡¯s types and their relationships to literatures are introduced, based on which data set about Leukemia for classification is constructed with original data sources, such as Cancer Gene Census, PubMed and gene2pubmed. The data set is imbalanced as the research object. Based on the introduction of current classification methods of imbalanced data set, the problems of sampling in imbalanced data set are analyzed, and mixed-sampling method is proposed to classify the Leukemia data set. The multi-class problem about Leukemia is transferred to a set of two-class problems. Area Under Receiver Operating Characteristic (ROC) Curve (AUC) are used to evaluate the mixed-sampling method. Then, experiments are performed to verify the classification efficiency and stability of eight classification methods, and their classification results are comparatively analyzed. It can be found that the mixed-sampling method achieves the best performance. At last, the research work !
in this paper is concluded with a look forward to the future work.
Keywords: Leukemia, Literature Profiling, Imbalanced Data Distribution, Decision Tree, mixed-sampling, Ensemble Learning.
EXTENSION of the file: .pdf
Special (Invited) Session:
Organizer of the Session:
How Did you learn about congress: text mining; machine learning
IP ADDRESS: 58.23.30.178