NLP Deep Learning MCQ · test your knowledge
From word embeddings to Transformers – 15 questions covering RNNs, LSTMs, attention, BERT, and modern NLP.
Deep Learning for Natural Language Processing
Deep learning revolutionized NLP by enabling models to learn hierarchical representations of text. From word embeddings (Word2Vec, GloVe) to recurrent models (RNN, LSTM) and the Transformer architecture (BERT, GPT), this MCQ tests your understanding of how neural networks process language.
Why deep learning for NLP?
Traditional NLP relied on hand‑crafted features. Deep learning automatically learns useful representations, capturing syntax, semantics, and context directly from raw text.
NLP deep learning glossary – key concepts
Word embeddings
Dense vector representations of words (e.g., Word2Vec, GloVe, fastText) that capture semantic similarity.
RNN / LSTM / GRU
Recurrent architectures process sequences by maintaining a hidden state. LSTMs and GRUs mitigate vanishing gradients.
Attention mechanism
Allows the model to focus on relevant parts of the input when producing each output. Key component of Transformers.
Transformer
Architecture based solely on attention, enabling parallelization and capturing long‑range dependencies.
BERT (Bidirectional Encoder Representations from Transformers)
Pretrained Transformer encoder fine‑tuned for various NLP tasks. Uses masked language modeling.
GPT (Generative Pretrained Transformer)
Autoregressive Transformer decoder for text generation.
Seq2seq with attention
Encoder‑decoder framework for machine translation, summarization, etc.
# Self-attention in a nutshell (conceptual) # Attention(Q,K,V) = softmax(QK^T / sqrt(d_k)) V # Each token attends to all tokens, weighted by similarity.
Common NLP deep learning interview questions
- How do word embeddings capture semantic meaning?
- What are the advantages of LSTMs over vanilla RNNs?
- Explain the attention mechanism and its role in Transformers.
- Why do Transformers use positional encodings?
- What is the difference between BERT and GPT?
- How does masked language modeling work in BERT pretraining?