Word Embeddings

Transition from sparse matrices to dense, continuous, low-dimensional vector spaces capable of capturing complex meaning.

Introduction to Word Embeddings

We've looked at One-Hot, BoW, and TF-IDF encoding. All of these generate Sparse Vectors (mostly zeros) where the length of the vector is equal to the massive size of the vocabulary (50k+ dimensions). Word Embeddings represented a paradigm shift in 2013: migrating from Sparse Vectors to Dense Vectors.

Sparse Vector (One-Hot)

"King" = [0, 0, 1, 0, 0, 0, 0, 0, 0....]

"Man" = [0, 0, 0, 0, 0, 1, 0, 0, 0....]

Length: 50,000+

Similarity: 0.0 (No overlap)

Dense Vector (Embedding)

"King" = [0.98, 0.45, -0.6, 0.12, 0.8]

"Man" = [0.93, 0.41, -0.9, 0.15, 0.3]

Length: Fixed sizes (e.g., 300)

Similarity: High (Vectors point same way)

How Dense Embeddings Work

Rather than counting words, an embedding model uses Neural Networks to map words into a continuous geometric space. Each dimension (number) in the fixed-length vector subtly captures a latent semantic feature (e.g., gender, royalty, color, sentiment).

Because the dimensions are dense (floats between -1 and 1 instead of sparse 0s), they compress vast vocabulary context into just 300 dimensions.
Cosine Similarity on the angles of these vectors accurately measures how conceptually similar two words are.

The State of the Art: The "Big 3" Static Embeddings

1. Word2Vec (2013)

Developed by Google

A predictive model that uses a shallow Neural Network to guess words based on their neighbors (or vice versa).

2. GloVe (2014)

Developed by Stanford

A count-based model that performs matrix factorization on a gigantic global word Co-occurrence Matrix to derive vectors.

3. FastText (2016)

Developed by Facebook AI

An extension of Word2Vec that trains on sub-word character N-grams (e.g., "apple" = "app", "ppl", "ple"). Can handle unknown spelling errors!

*Note: Since 2018, static embeddings have largely been superseded by Contextual Embeddings like BERT and LLMs, though they remain vital for lightweight tasks.

Previous: Co-occurrence

Related Natural Language Processing Links