The following information was submitted:
Transactions: INTERNATIONAL JOURNAL of APPLIED MATHEMATICS AND INFORMATICS
Transactions ID Number: 20-656
Full Name: Tengku Sembok
Position: Professor
Age: ON
Sex: Male
Address: Faculty of Information Science and Technology
Country: MALAYSIA
Tel: 0123373539
Tel prefix: +6
Fax: +60389256732
E-mail address: tmtsembok@gmail.com
Other E-mails: tmts@ftsm.ukm.my
Title of the Paper: Effectiveness of Stemming and n-grams String Similarity Matching on Malay Documents
Authors as they appear in the Paper: Tengku Mohd T. Sembok and Zainab Abu Bakar
Email addresses of all the authors: tmtsembok@gmail.com, zainab@fsmtk.uitm.edu.my
Number of paper pages: 8
Abstract: There are two main classes of conflation algorithms, namely, string-similarity algorithms and stemming algorithms. String-similarity matching algorithms, bi-grams and tri-grams, are used in the experiments conducted on Malay texts. Malay stemming algorithms used in the experiments is developed by Fatimah et al. Inherent characteristics of n-grams on Malay documents are discussed in this paper. Retrieval effectiveness experiments using several variations of combinations between n-grams and stemming algorithms are performed in order to find the best combination. The variations experimented are: both nonstemmed queries and documents; stemmed queries and nonstemmed documents; and both stemmed queries and documents. Further experiments are then carried out by removing the most frequently occurring n-grams. Besides using dice coefficients to rank documents, inverse document frequency (idf) weights are also used. Interpolation technique and standard recall-precision func!
tions are used to calculate recall-precision values. It is found that using combined search, n-gram matching and stemming, improves retrieval effectiveness. Removing the most frequently occurring n-gram that appears in about 46% of the words also improve the retrieval effectiveness.
Keywords: Information Retrieval, String Similarity Matching, Stemming Algorithms.
EXTENSION of the file: .doc
Special (Invited) Session: Characteristics and retrieval effectiveness of n-gram string similarity matching on Malay documents
Organizer of the Session: 653-228
How Did you learn about congress: Prof. Dr. Shahrul Azman Mohd Noah, samn@ftsm.ukm.my
IP ADDRESS: 60.50.189.118