The following information was submitted:
Transactions: WSEAS TRANSACTIONS ON COMPUTERS
Transactions ID Number: 32-902
Full Name: Hsukuang Chang
Position: Senior Lecturer
Age: ON
Sex: Male
Address: No.1, Sec. 1, Syuecheng Rd.,Dashu Township, Kaohsiung
Country: TAIWAN
Tel: 07-6577711-6533
Tel prefix:
Fax:
E-mail address: hkchang@isu.edu.tw
Other E-mails:
Title of the Paper: semantic structural cluster validity for xml documents
Authors as they appear in the Paper: Hsu-Kuang Chang, I-Chang Jou, King-Chu Hung
Email addresses of all the authors:
Number of paper pages: 10
Abstract: The amount of XML data is increasing as electronic document systems adopt XML as the standard format in document exchange. With the marked increase in amount of online information, XML document clustering is playing a critical role in efficient document organization, navigation, and retrieval of large numbers of documents. A great deal of active research is being conducted on XML document clustering to facilitate efficient analysis of the information represented in XML documents. Clustering an XML document collection involves dealing with many ambiguous categories. A cluster, i.e., a set of XML document groups, depends on the chosen clustering algorithm as well as on the algorithm¡¦s parameter settings. To find the best among several clustering results, it is common practice to evaluate their internal structures with a cluster validity measure. Clustering is considered useful if particular structural properties are well developed. Nevertheless, the presence !
of certain structural properties may not guarantee usefulness from the standpoint of information retrieval, e.g., whether the found XML document groups resemble the classification of a human editor. Based on already classified XML document collections, we performed clustering and compared the predicted quality of these clusters to their real quality. Our analysis included the classical cluster validity measures, modified Bezdek (abbreviated as MB) and proposed modified Davies¡VBouldin (abbreviated as MDB). MDB was more accurate than was the Dunn measure. In addition, the new within-group variance (WGV)/ between-group variance (BGV) measure, which combines the distance measure with the ƒÝ membership within the cluster, showed results superior to those for MB and MDB.
Keywords: Dtd, Xml, Clustering Validity, Wgv, Bgv.
EXTENSION of the file: .pdf
Special (Invited) Session:
Organizer of the Session:
How Did you learn about congress:
IP ADDRESS: 140.127.194.25