Related Natural Language Processing Links
Learn Nlp Natural Language Processing Tutorial, validate concepts with Nlp Natural Language Processing MCQ Questions, and prepare interviews through Nlp Natural Language Processing Interview Questions and Answers.
NLP Basics Q&A
20 important question answering concepts with concise explanations. Read each question and review the short answer beneath it.
What is question answering (QA) in NLP?
Answer: QA is the task of taking a natural language question and returning an appropriate answer, often by reading a context passage (reading comprehension) or using a knowledge base (open-domain QA).
What is the difference between extractive and abstractive QA?
Answer: Extractive QA selects a span of text directly from the context as the answer, whereas abstractive QA may generate a new answer sentence that paraphrases or summarizes the relevant information.
What is SQuAD and why is it important?
Answer: SQuAD (Stanford Question Answering Dataset) is a large-scale reading comprehension benchmark where models read Wikipedia passages and answer crowd-sourced questions, and it became a standard for evaluating extractive QA models.
What is a span-based QA model?
Answer: A span-based QA model predicts the start and end positions of the answer within the context passage, effectively treating QA as span extraction rather than free-form generation.
How are BERT-style models used for extractive QA?
Answer: For extractive QA, BERT encodes the concatenated question and context, and two classifier heads predict start and end indices over the context tokens to mark the answer span.
What is open-domain QA?
Answer: Open-domain QA answers questions using a very large corpus or the web instead of a single given passage, usually combining a retriever (to find documents) and a reader (to extract or generate answers).
What is the difference between closed-book and open-book QA?
Answer: In closed-book QA the model must answer using only its internal parameters (no external context at inference time), while in open-book QA it is given supporting documents or can retrieve them at inference time.
Why is tokenization important for QA models?
Answer: QA models often predict start and end positions at the token level, so consistent tokenization (e.g. WordPiece or BPE) is critical to align model outputs with the original text spans.
What is a “no-answer†case in QA?
Answer: In some datasets the correct behavior is to say that the context does not contain an answer; models handle this by predicting a special no-answer score or a span that maps to “no answerâ€.
Which metrics are commonly used to evaluate extractive QA?
Answer: Exact Match (EM) and token-level F1 are standard metrics, measuring whether the predicted span matches the gold span exactly and how much the token overlap is between them.
What role does attention play in QA models?
Answer: Attention allows the model to focus on question-relevant parts of the context, aligning question tokens with context tokens to better locate the answer span or generate an answer.
How does multi-hop QA differ from single-hop QA?
Answer: Multi-hop QA requires reasoning over multiple passages or sentences to answer a question, whereas single-hop QA can be solved using one local piece of evidence.
What is a reader–retriever architecture in QA?
Answer: A retriever first fetches candidate documents from a large corpus (e.g. using BM25 or dense retrieval), and then a reader model performs fine-grained extraction or generation over the retrieved passages.
Why is answer normalization used before computing EM/F1?
Answer: Normalization (e.g. lowercasing, removing punctuation and articles) reduces spurious mismatches so that semantically identical answers like “the Eiffel Tower†and “Eiffel Tower†are treated as equivalent.
How can large language models be used for generative QA?
Answer: Large language models can be prompted with the question and optionally context to directly generate a natural language answer, often combining retrieval with generation for factual accuracy.
What are common sources of error in QA systems?
Answer: Typical errors include misunderstanding the question, attending to the wrong part of the context, predicting partially correct spans, and hallucinating unsupported answers in generative settings.
What is the difference between factoid and non-factoid QA?
Answer: Factoid QA expects short, factual answers (names, dates, entities), while non-factoid QA may require longer, explanatory answers such as “why†or “how†questions.
Why is domain adaptation important for QA models?
Answer: QA models trained on one domain (like Wikipedia) can degrade on another (like biomedical text), so adapting or fine-tuning on in-domain data improves robustness and accuracy.
What is conversational QA?
Answer: Conversational QA handles multi-turn dialogues where each question may depend on previous turns, requiring the model to track context, coreference and dialogue history.
How do retrieval-augmented generators (RAG) help QA?
Answer: RAG-style models combine a neural retriever with a generative model so answers are grounded in retrieved documents, improving factual correctness while keeping generation flexible.
🔠Question answering concepts covered
This page covers question answering fundamentals: extractive vs abstractive QA, SQuAD-style reading comprehension, span prediction and evaluation with EM/F1.