The following information was submitted:
Transactions: WSEAS TRANSACTIONS ON COMPUTERS
Transactions ID Number: 52-429
Full Name: Mahmoud Rammal
Position: Assistant
Professor
Age: ON
Sex: Male
Address: Sami Solh Street - Bp 5396/116
Country: LEBANON
Tel: 00961 1 423 137
Tel prefix: 961 1
Fax: 00961 1 423 139
E-mail address: rammal.mahmoud@gmail.com
Other E-mails: mrammal@ul.edu.lb
Title of the Paper: Improving Arabic Information Retrieval System using n-gram method
Authors as they appear in the Paper: Mahmoud Rammal - Majed Sanan - Khaldoun Zreik
Email addresses of all the authors: mrammal@ul.edu.lb, Sinane80@hotmail.com, zreik@univ-paris8.fr
Number of paper pages: 10
Abstract: This paper presents the application of the indexing method and the Retrieval systems based on N-grams to the Arabic legal language used in official Lebanese government journal documents. In our work we have used N-gram as a representation method, based on words and characters, and then compared the results using the vector space model with three similarity measures: the TF*IDF weighting, Dice's coefficient and the Cosine Coefficient. The experiments demonstrate the use of trigrams to index Arabic documents is the optimal choice for Arabic information retrieval using N-grams. But using N-grams to indexing and retrieval legal Arabic documents is still insufficient in order to obtain good results and it is indispensable to adopt a linguistic approach that uses a legal thesaurus or ontology for juridical language
Keywords: Arabic language, Indexing, N-grams, Information Retrieval, Word segmentation
EXTENSION of the file: .rtf
Special (Invited) Session:
Organizer of the Session:
How Did you learn about congress: Information Retrieval, N-gram, Categorisation
IP ADDRESS: 194.126.23.213