Volume 6 Issue 4 - March 2016

  • 1. Performance analysis of semantic similarity from wikipedia using auc

    Authors : Arya S, Shanmugapriya S

    Pages : 228 - 234

    Keywords : Web Mining, Machine Learning, Snippets, Support Vector Machine, LIBSVM, Area Under Curve

    Abstract :

    The semantic similarity measurement between words or phrases play vital role in Natural Language Processing and Information retrieval tasks such as Word Sense Disambiguation, Query expansion etc. Similarity can be computed from a thesaurus such as WordNet or statistics from a large corpus. This paper presents a Wikipedia based measurement of semantic similarity using LIBSVM. The words from the snippets obtained from Wikipedia are processed using stemming algorithm and the stop words are removed from the document. TF-IDF provides a weighing scheme that gives the attribute- value representation of word pairs in documents. The resulting feature vectors are used in SVM classification. We used LIBSVM in RapidMiner data mining tool for experimental evaluation. Our method was evaluated using WordSim353 similarity dataset. LIBSVM is trained to classify between synonymous word pairs and non-synonymous word pairs. The evaluation result shows that our method has higher accuracy and AUC results than the existing methods. Our experiment also proves that AUC in general is a better measure than accuracy which is to be considered in machine learning applications.

    Citing this Journal Article :

    Arya S, Shanmugapriya S, "Performance analysis of semantic similarity from wikipedia using auc", Volume 6 Issue 4 - March 2016, 228 - 234