RNN for NLP

Explore vanilla recurrent networks and bidirectional variants.

Recurrent Neural Networks (RNN)

Recurrent Neural Networks (RNNs) are a class of artificial neural networks designed specifically for processing sequential data, such as time series, audio, or natural language text.

How RNNs Differ from Feed-Forward Nets

Feed Forward (Standard DNN)

A standard neural network processes the entire input all at once and has a fixed input size. It has no concept of "memory" or order.

Recurrent (RNN)

RNNs process sequences step-by-step. They maintain a Hidden State (a memory vector) that gets continually updated as it reads through the sentence word by word.

Level 1 — Building an RNN in Keras

RNNs excel at sequence classification (like sentiment analysis). Here, we process words sequentially to determine if a review is positive or negative.

Vanilla text classification RNN

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Embedding

vocab_size = 10000
embedding_dim = 32
max_sequence_length = 100

model = Sequential([
    # Turn positive integers (word indices) into dense vectors of fixed size
    Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_sequence_length),
    
    # Vanilla RNN layer: maintains a 64-dimension hidden state across time steps
    SimpleRNN(64, return_sequences=False),
    
    # Binary classification output layer
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

Level 2 — Bidirectional RNNs

A standard RNN only knows what happened before the current word. A Bidirectional RNN processes the sentence going forwards AND backwards simultaneously, concatenating the hidden states. This provides full context of the whole sentence!

Bidirectional RNN setup

from tensorflow.keras.layers import Bidirectional

bi_model = Sequential([
    Embedding(input_dim=vocab_size, output_dim=embedding_dim),
    # By wrapping the RNN in Bidirectional, Keras automatically handles 
    # the forward and backward passes and combines them.
    Bidirectional(SimpleRNN(64)),
    Dense(1, activation='sigmoid')
])

The Core Flaw: The Vanishing Gradient Problem

Why don't we use purely Vanilla RNNs? As an RNN processes a very long sequence (like a paragraph), during backpropagation, gradients are multiplied many times. Values smaller than 1 quickly disappear to 0 (Vanishing Gradient). Result: A Vanilla RNN suffers from short-term memory and forgets the beginning of a sentence by the time it reaches the end. This led to the creation of LSTMs!

Previous: Text Generation

Related Natural Language Processing Links