Related Natural Language Processing Links
Learn Wsd Natural Language Processing Tutorial, validate concepts with Wsd Natural Language Processing MCQ Questions, and prepare interviews through Wsd Natural Language Processing Interview Questions and Answers.
Word Sense Disambiguation
Understand Word Sense Disambiguation (WSD) and how algorithms determine which definition of a polysemous word applies in context.
Word Sense Disambiguation (WSD)
One of the most notoriously difficult problems in all of linguistics is Polysemy—the capacity for a single word to have multiple distinct meanings (senses). Word Sense Disambiguation (WSD) is the task of computational algorithms attempting to figure out which "sense" of a word is being used in a given sentence.
The Ambiguity Framework
Target Word: Bass
Sense 1: Music Context
"He played the bass guitar."
Sense 2: Animal Context
"I caught a massive bass while fishing."
The Lesk Algorithm (Knowledge-Based WSD)
Created by Michael Lesk in 1986, this is the classic dictionary-based algorithm for WSD. It relies heavily on taxonomies like WordNet. The core idea is incredibly elegant: Count the overlapping words between the context of the sentence and the dictionary definition of the senses.
How Lesk Calculates
Given the sentence: "We had grilled pine cones for dessert."
- Fetch Dictionary Definitions for "pine":
- Sense A: "A kind of evergreen tree with needle-shaped leaves and cones."
- Sense B: "To waste away through sorrow or illness."
- Define the Sentence Context:
Context = {"grilled", "cones", "dessert"} - Calculate Intersection Overlap: Compare Context to Definitions. Sense A overlaps on the word "cones" (Score=1). Sense B has no overlap (Score=0). The algorithm correctly assigns Sense A!
Modern NLP WSD
The Lesk Algorithm struggles heavily because dictionary definitions are notoriously short, resulting in low overlap scores (often 0 overlap for both senses).
Deep Contextualized Embeddings (like ELMo and BERT) effectively solved the WSD problem. Because they generate dynamic embeddings on the fly, the mathematical vector for "bass" in a fishing context is fundamentally separated in vector-space geometry from the word "bass" in a guitar context. We simply utilize a K-Nearest Neighbors classifier on the generated embedding space!