Related Natural Language Processing Links
Learn Cooccurrence Natural Language Processing Tutorial, validate concepts with Cooccurrence Natural Language Processing MCQ Questions, and prepare interviews through Cooccurrence Natural Language Processing Interview Questions and Answers.
Co-occurrence Matrix
Understand how words appearing together in a specific window of context creates rich semantic statistical models.
Co-occurrence Matrix
Bag-of-Words and TF-IDF create Document-Term matrices (Documents as rows, Words as columns). In contrast, a Co-occurrence Matrix creates a Word-Word matrix (Words as rows, Words as columns). It captures how often two different words appear together within a specific "window" distance in a sentence.
This follows the distribution hypothesis by famous linguist J.R. Firth: "You shall know a word by the company it keeps." Words that appear in similar contexts usually share semantic meaning.
How it works: The Context Window
Assume a corpus with one sentence: "deep learning is incredibly exciting"
If we set our Window Size = 1 (look 1 word left, 1 word right), we scan the text:
- Focus on "learning": Left is "deep", Right is "is".
- Add +1 to coordinates (learning, deep) and (learning, is) in the matrix.
| deep | learning | is | incredibly | exciting | |
|---|---|---|---|---|---|
| deep | 0 | 1 | 0 | 0 | 0 |
| learning | 1 | 0 | 1 | 0 | 0 |
| is | 0 | 1 | 0 | 1 | 0 |
| incredibly | 0 | 0 | 1 | 0 | 1 |
| exciting | 0 | 0 | 0 | 1 | 0 |
Advantages
- Preserves profound semantic relationships (unlike BoW).
- Vectors from this matrix possess geometric meaning. Synonyms clustered together in mathematical space.
- Forms the fundamental mathematical backbone for GloVe embeddings and Latent Semantic Analysis (LSA).
Disadvantages
- Memory Intensive: Matrix size is Vocab x Vocab. If V=100,000, you need an array with 10 Billion elements! (Usually requires Sparse Matrices).
- Requires Singular Value Decomposition (SVD) dimensionality reduction to be practically useful in modeling.