Towards Developing Word Sense Disambiguation System for Kashmiri Language
Keywords:
Information Storage and Retrieval, Machine Learning, Natural Language Processing, Decision Trees.
Abstract
Background: A word, phrase, sentence or other communication is “ambiguous” if interpreted in multiple ways. The process of assigning the correct meaning to a word with respect to its context is known as Word Sense Disambiguation (WSD). WSD is intended to be a very imperious problem in Natural Language Processing (NLP) that requires proper attention as it impacts the performance of various NLP applications. Objectives: In this paper first attempt is made to propose a supervised machine learning Kashmiri WSD system. Material & Methods: The dataset comprising of 500K tokens for this research study has been collected from different resources. A sense annotated corpus for fifty commonly used ambiguous Kashmiri words has been created using the manual annotation method. Kashmiri WordNet is used to extract senses for the target words. Decision-tree based classifier is trained using the features extracted from annotated corpus for carrying out WSD task. We have used context widow of ±3 to extract features that are used to train the classifier. Results: The proposed system is tested on all fifty target words and evaluation is carried using accuracy, precision, recall and F-1 measures. The proposed system reported 81.831% accuracy, 0.834 precision,0.816 recall and 0.824 F1-measure. Conclusions: This was the initial step towards developing the WSD system for Kashmir and it has shown good results. In the future we expect to use other algorithms to carry out this task with greater language coverageDownloads
Download data is not yet available.
Published
2023-06-30
Section
Research Articles
Copyright (c) 2023 SAMRIDDHI: A Journal of Physical Sciences, Engineering and Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.