Related Natural Language Processing Links
Learn Embeddings Natural Language Processing Tutorial, validate concepts with Embeddings Natural Language Processing MCQ Questions, and prepare interviews through Embeddings Natural Language Processing Interview Questions and Answers.
Word Embeddings
Transition from sparse matrices to dense, continuous, low-dimensional vector spaces capable of capturing complex meaning.
Introduction to Word Embeddings
We've looked at One-Hot, BoW, and TF-IDF encoding. All of these generate Sparse Vectors (mostly zeros) where the length of the vector is equal to the massive size of the vocabulary (50k+ dimensions). Word Embeddings represented a paradigm shift in 2013: migrating from Sparse Vectors to Dense Vectors.
Sparse Vector (One-Hot)
"King" = [0, 0, 1, 0, 0, 0, 0, 0, 0....]
"Man" = [0, 0, 0, 0, 0, 1, 0, 0, 0....]
Dense Vector (Embedding)
"King" = [0.98, 0.45, -0.6, 0.12, 0.8]
"Man" = [0.93, 0.41, -0.9, 0.15, 0.3]
How Dense Embeddings Work
Rather than counting words, an embedding model uses Neural Networks to map words into a continuous geometric space. Each dimension (number) in the fixed-length vector subtly captures a latent semantic feature (e.g., gender, royalty, color, sentiment).
- Because the dimensions are dense (floats between -1 and 1 instead of sparse 0s), they compress vast vocabulary context into just 300 dimensions.
- Cosine Similarity on the angles of these vectors accurately measures how conceptually similar two words are.
The State of the Art: The "Big 3" Static Embeddings
1. Word2Vec (2013)
Developed by Google
A predictive model that uses a shallow Neural Network to guess words based on their neighbors (or vice versa).
2. GloVe (2014)
Developed by Stanford
A count-based model that performs matrix factorization on a gigantic global word Co-occurrence Matrix to derive vectors.
3. FastText (2016)
Developed by Facebook AI
An extension of Word2Vec that trains on sub-word character N-grams (e.g., "apple" = "app", "ppl", "ple"). Can handle unknown spelling errors!