k-NN Q&A 20 Core Questions
Interview Prep

k-Nearest Neighbors: Interview Q&A

Short questions and answers on k-NN: distance metrics, choosing k, scaling and its use for classification and regression.

Distance Neighbors Classification Regression
1 What is the basic idea of k-NN? âš¡ Beginner
Answer: k-NN predicts the label of a new point by looking at the k closest training examples and using a simple rule like majority vote or average.
2 Is k-NN a lazy learner or eager learner? âš¡ Beginner
Answer: k-NN is a lazy learner: it does not build an explicit model during training and delays most work to prediction time.
3 Which distance metrics are commonly used in k-NN? âš¡ Beginner
Answer: Common choices are Euclidean, Manhattan and Minkowski distances; cosine distance is also used for text or high-dimensional data.
4 Why is feature scaling important for k-NN? 📊 Intermediate
Answer: Distance is sensitive to feature scales; without scaling, large-scale features dominate the distance computation.
5 How do you choose the value of k? 📊 Intermediate
Answer: k is typically chosen using cross-validation, trying several values and picking one that balances bias and variance.
6 What happens when k is too small or too large? 📊 Intermediate
Answer: Very small k leads to high variance and overfitting; very large k leads to high bias and oversmoothing.
7 Can k-NN be used for regression? âš¡ Beginner
Answer: Yes, k-NN regression predicts the average (or weighted average) target value of the neighbors.
8 How does k-NN handle categorical features? 📊 Intermediate
Answer: Categorical features are usually encoded (e.g., one-hot) and used with a suitable distance or similarity measure.
9 What is a weighted k-NN? 🔥 Advanced
Answer: Weighted k-NN assigns higher weights to closer neighbors when aggregating labels or target values.
10 Why is k-NN sensitive to the curse of dimensionality? 🔥 Advanced
Answer: In high dimensions, points become almost equally distant, making “nearest” neighbors less meaningful and hurting performance.
11 Is k-NN fast or slow at prediction time? âš¡ Beginner
Answer: Prediction can be slow because k-NN typically needs to compute distances to many training points.
12 How can you speed up k-NN on large datasets? 🔥 Advanced
Answer: You can use indexing structures (k-d trees, ball trees), approximate nearest neighbor search or reduce dimensionality.
13 Does k-NN build a global or local model? 📊 Intermediate
Answer: k-NN is a local method; predictions are based only on the local neighborhood around each query point.
14 When is k-NN a good baseline algorithm? âš¡ Beginner
Answer: It’s a good baseline on moderate-size, low-dimensional datasets where distance makes sense and training time must be minimal.
15 Can k-NN handle multi-class problems? âš¡ Beginner
Answer: Yes, k-NN naturally extends to multi-class classification via majority vote among neighbors’ classes.
16 How does noise in the data affect k-NN? 📊 Intermediate
Answer: Noise can significantly affect predictions, especially for small k; larger k and smoothing help reduce sensitivity.
17 How do you handle tie-breaking in k-NN classification? 🔥 Advanced
Answer: You can use odd k, distance-weighted voting, or consistent tie-breaking rules (e.g., pick class with higher prior).
18 What is the main memory drawback of k-NN? âš¡ Beginner
Answer: It needs to store the entire training set, which can be expensive for large datasets.
19 Give a simple real-world use case of k-NN. âš¡ Beginner
Answer: k-NN is used in recommendation systems, document similarity and basic anomaly detection based on local neighborhoods.
20 What is the key message to remember about k-NN? âš¡ Beginner
Answer: k-NN is a simple, intuitive, non-parametric method that works well when distance is meaningful and datasets are not too large or high-dimensional.

Quick Recap: k-NN

Think of k-NN as “show me similar examples and copy their labels”; the key is choosing the right distance, scaling and k to make similarity meaningful.