Performance analysis of semantic similarity from wikipedia using auc

Arya S; Shanmugapriya S

International Journal of Latest Trends in Engineering and Technology

Index Copernicus, ICV - 77.02/100
Index Copernicus, ICV - 7.39/10
e-ISSN : 2278-621X
Cosmos Impact Factor - 4.490/10
Global Impact Factor - 0.685/1
p-ISSN : 2319-3778

Volume 6 Issue 4 - March 2016

1. Performance analysis of semantic similarity from wikipedia using auc

Authors : Arya S, Shanmugapriya S

Pages : 228 - 234

Keywords : Web Mining, Machine Learning, Snippets, Support Vector Machine, LIBSVM, Area Under Curve
Abstract :
The semantic similarity measurement between words or phrases play vital role in Natural Language Processing and Information retrieval tasks such as Word Sense Disambiguation, Query expansion etc. Similarity can be computed from a thesaurus such as WordNet or statistics from a large corpus. This paper presents a Wikipedia based measurement of semantic similarity using LIBSVM. The words from the snippets obtained from Wikipedia are processed using stemming algorithm and the stop words are removed from the document. TF-IDF provides a weighing scheme that gives the attribute- value representation of word pairs in documents. The resulting feature vectors are used in SVM classification. We used LIBSVM in RapidMiner data mining tool for experimental evaluation. Our method was evaluated using WordSim353 similarity dataset. LIBSVM is trained to classify between synonymous word pairs and non-synonymous word pairs. The evaluation result shows that our method has higher accuracy and AUC results than the existing methods. Our experiment also proves that AUC in general is a better measure than accuracy which is to be considered in machine learning applications.

Citing this Journal Article :
Arya S, Shanmugapriya S, "Performance analysis of semantic similarity from wikipedia using auc", Volume 6 Issue 4 - March 2016, 228 - 234

Click here to Submit Copyright Takedown Notice for this article.

View Paper

International Journal of Latest Trends in Engineering and Technology

Index Copernicus, ICV - 77.02/100

Index Copernicus, ICV - 7.39/10

e-ISSN : 2278-621X

Cosmos Impact Factor - 4.490/10

Global Impact Factor - 0.685/1

p-ISSN : 2319-3778

Other Links

Indexed By

Volume 6 Issue 4 - March 2016

1. Performance analysis of semantic similarity from wikipedia using auc