Mixed NLP questions â€“ core concepts set 1

20 mixed interview-style questions and answers that revisit key NLP topics from preprocessing and vectorization to sequence models, transformers and evaluation metrics.

Why is text normalization important in NLP preprocessing?

Answer: Normalization (like lowercasing, Unicode normalization and removing noise) reduces variation caused by formatting rather than meaning, helping models focus on genuine linguistic patterns instead of superficial differences.

What is the main difference between one-hot vectors and word embeddings?

Answer: One-hot vectors are sparse and carry no similarity information, whereas embeddings are dense, learned representations where semantically related words have similar vectors in continuous space.

Why is context important for word meaning in NLP?

Answer: Many words are polysemous; their meaning changes depending on surrounding words, so models that encode context (like contextual embeddings from transformers) better capture usage than static embeddings alone.

What is the vanishing gradient problem and which NLP models historically suffered from it?

Answer: In deep or long RNNs, gradients can shrink toward zero across many time steps, preventing long-distance learning; vanilla RNNs particularly suffered, motivating the development of LSTMs and GRUs with gating mechanisms.

How do attention mechanisms improve sequence-to-sequence models?

Answer: Attention allows the decoder to focus on relevant parts of the input at each step instead of compressing everything into a single vector, improving translation quality and enabling better long-range dependency modeling.

Why did transformers largely replace RNNs in many NLP tasks?

Answer: Transformers are more parallelizable, capture global context via self-attention and scale more effectively to large models and datasets, leading to better performance on many language understanding and generation benchmarks.

What is the purpose of positional encodings in self-attention models?

Answer: Because self-attention is order-agnostic by default, positional encodings or embeddings inject sequence order information, enabling the model to distinguish between different positions and reason about word order.

What is overfitting in NLP models and how can it be mitigated?

Answer: Overfitting occurs when a model learns patterns specific to the training set rather than generalizable structures; it can be mitigated with regularization, dropout, larger and more diverse data, early stopping and careful model selection.

Why are pretrained language models fine-tuned rather than trained from scratch for most tasks?

Answer: Pretraining on large corpora learns general language knowledge; fine-tuning adapts this knowledge to specific tasks with relatively little labeled data, saving compute and usually achieving better performance than training anew.

What is the difference between generative and discriminative NLP models?

Answer: Generative models (like LMs) model the full distribution over text and can generate samples, while discriminative models focus on predicting labels or outputs given inputs, often achieving higher accuracy on specific tasks with less modeling complexity.

Why is tokenization a critical design choice for NLP systems?

Answer: Tokenization defines the units that models see; it affects vocabulary size, handling of rare words, efficiency and evaluation metrics, so consistent, well-chosen tokenization is vital for stable training and fair comparisons.

What is transfer learning and why is it powerful in NLP?

Answer: Transfer learning reuses representations or weights from a model trained on one task or dataset for another, letting downstream tasks benefit from knowledge captured during large-scale pretraining and reducing data requirements.

What does it mean when a model â€œhallucinatesâ€ in text generation?

Answer: Hallucination occurs when a model confidently produces plausible-sounding but factually incorrect or unsupported statements, reflecting that language models generate based on patterns rather than guaranteed factual grounding.

How are evaluation metrics like BLEU and ROUGE different from accuracy?

Answer: Accuracy is for discrete labels, while BLEU and ROUGE operate on text outputs, measuring n-gram or sequence overlaps with references to approximate quality in tasks like MT and summarization where outputs are sequences, not labels.

Why is human evaluation often necessary for generative NLP?

Answer: Automatic metrics cannot fully capture nuances of meaning, coherence, factuality and style; human judgments provide richer feedback about whether generated outputs are truly useful and high quality for end users.

What is domain adaptation in NLP and how can it be done?

Answer: Domain adaptation tunes models trained on one domain to work well on another (e.g. news â†’ medical) using fine-tuning on in-domain data, unsupervised adaptation, or mixing domain-specific and general corpora during training.

Why must we consider bias and fairness when deploying NLP models?

Answer: Models can reflect and amplify societal biases present in training data, leading to unfair or harmful outputs; responsible deployment requires auditing, mitigation techniques and careful monitoring for biased behavior across groups.

What is the difference between zero-shot, one-shot and few-shot learning with large language models?

Answer: Zero-shot uses only instructions, one-shot adds a single example and few-shot provides several examples in the prompt; each level supplies more guidance about the task for the modelâ€™s in-context learning behavior.

How does retrieval-augmented generation improve factual accuracy?

Answer: RAG retrieves relevant documents from an external knowledge base and feeds them to the generator, grounding answers in up-to-date information and reducing reliance on memorized or outdated knowledge in model parameters.

Why is it useful to review mixed-topic Q&A when preparing for NLP interviews?

Answer: Mixed questions simulate real interviews where topics jump across the stack, helping you strengthen connections between concepts and recall key ideas quickly under less predictable questioning.

â† METEOR Score Q&A Next: Mixed NLP Q&A â€“ Set 2 â†’

NLP Q&A

Related Natural Language Processing Links

Mixed NLP questions â€“ core concepts set 1

Why is text normalization important in NLP preprocessing?

What is the main difference between one-hot vectors and word embeddings?

Why is context important for word meaning in NLP?

What is the vanishing gradient problem and which NLP models historically suffered from it?

How do attention mechanisms improve sequence-to-sequence models?

Why did transformers largely replace RNNs in many NLP tasks?

What is the purpose of positional encodings in self-attention models?

What is overfitting in NLP models and how can it be mitigated?

Why are pretrained language models fine-tuned rather than trained from scratch for most tasks?

What is the difference between generative and discriminative NLP models?

Why is tokenization a critical design choice for NLP systems?

What is transfer learning and why is it powerful in NLP?

What does it mean when a model â€œhallucinatesâ€ in text generation?

How are evaluation metrics like BLEU and ROUGE different from accuracy?

Why is human evaluation often necessary for generative NLP?

What is domain adaptation in NLP and how can it be done?

Why must we consider bias and fairness when deploying NLP models?

What is the difference between zero-shot, one-shot and few-shot learning with large language models?

How does retrieval-augmented generation improve factual accuracy?

Why is it useful to review mixed-topic Q&A when preparing for NLP interviews?

ðŸ” Mixed NLP concepts covered â€“ Set 1

NLP Q&A

Related Natural Language Processing Links

Mixed NLP questions â€“ core concepts set 1

Why is text normalization important in NLP preprocessing?

What is the main difference between one-hot vectors and word embeddings?

Why is context important for word meaning in NLP?

What is the vanishing gradient problem and which NLP models historically suffered from it?

How do attention mechanisms improve sequence-to-sequence models?

Why did transformers largely replace RNNs in many NLP tasks?

What is the purpose of positional encodings in self-attention models?

What is overfitting in NLP models and how can it be mitigated?

Why are pretrained language models fine-tuned rather than trained from scratch for most tasks?

What is the difference between generative and discriminative NLP models?

Why is tokenization a critical design choice for NLP systems?

What is transfer learning and why is it powerful in NLP?

What does it mean when a model â€œhallucinatesâ€ in text generation?

How are evaluation metrics like BLEU and ROUGE different from accuracy?

Why is human evaluation often necessary for generative NLP?

What is domain adaptation in NLP and how can it be done?

Why must we consider bias and fairness when deploying NLP models?

What is the difference between zero-shot, one-shot and few-shot learning with large language models?

How does retrieval-augmented generation improve factual accuracy?

Why is it useful to review mixed-topic Q&A when preparing for NLP interviews?

ðŸ” Mixed NLP concepts covered â€“ Set 1

What does it mean when a model â€œhallucinatesâ€ in text generation?

ðŸ” Mixed NLP concepts covered â€“ Set 1