ULMFiT Embedding(s) for Context and Extended Gloss Intersection for Marathi Word Sense Disambiguation

Sandip S Patil; R P Bhavsar; B V Pawar

Sandip S Patil School of Computer Sciences, K.B.C. North Maharashtra University Jalgaon, M.S. India.
R P Bhavsar School of Computer Sciences, K.B.C. North Maharashtra University Jalgaon, M.S. India.
B V Pawar School of Computer Sciences, K.B.C. North Maharashtra University Jalgaon, M.S. India.

Keywords: Marathi Word Sense Disambiguation, Lexical Relations, Neural Langauge Modeling, ULMFiT Model, Word Embedding.

Abstract

Ambiguities in the word meanings makes all the natural language processing (NLP) tasks very difficult, word sense disambiguation (WSD) is used to resolve these ambiguities. Now a day’s NLP-based human assistive systems are in demand, in which machines are expected to resolve word sense ambiguities. Today, due to the availability of machine readable dictionaries knowledge-based WSD approaches have become popular; it explores semantic relations between the contextual features and possible glosses of the given ambiguous word. Inductive transfer learning-based language models have great potential to represent the different semantic features of the word, which can be used in various NLP tasks. Universal language model fine-tuning for text classification (ULMFiT) is a popular transfer learning model used to embed various semantic features in digitally resource scare and morphologically rich language like marathi. In this reported work, the ambiguous words from the Marathi input sentence is extracted and have obtained its possible synset and glosses from IndoWordNet, these glosses are then extended using hypernym and hyponym relations. We have obtained the word embedding of marathi context and extended glosses using ULMFiT model. For the test run, we have crafted the test-bed of 6000 marathi sentences of 280 moderately ambiguous words harvested from marathi websites, which caters for 1200 senses. The winner sense is declared based on the maximum intersection score between the pair of context and gloss embedding. We have obtained the average accuracy up to 57.10% for our dataset.

Downloads

Download data is not yet available.