Linear Regression: Interview Q&A

Short questions and answers on how linear regression works, its assumptions, and how to diagnose and improve models.

OLS Cost Function RÂ² Residuals

1 What is linear regression in one sentence? âš¡ Beginner

Answer: Linear regression models the target as a linear combination of input features plus an intercept term.

2 What is the typical cost function used for linear regression? âš¡ Beginner

Answer: The most common cost is the mean squared error (MSE) between predictions and true values.

3 How are parameters estimated in ordinary least squares (OLS)? ðŸ“Š Intermediate

Answer: OLS finds parameters that minimize the sum of squared residuals, often using a closedâ€‘form solution (Xáµ€X)â»Â¹Xáµ€y or gradientâ€‘based optimization.

4 List two core assumptions of linear regression. âš¡ Beginner

Answer: Examples: (1) the relationship between X and y is linear, and (2) the residuals have constant variance (homoscedasticity).

5 What does the intercept term represent? âš¡ Beginner

Answer: The intercept is the predicted value of y when all input features are zero (within the domain of the data).

6 What is multicollinearity and why is it a problem? ðŸ“Š Intermediate

Answer: Multicollinearity means features are highly correlated; it can make coefficient estimates unstable and hard to interpret.

7 How can you detect nonâ€‘linearity in linear regression? ðŸ“Š Intermediate

Answer: Plot residuals vs predicted values or vs each feature; systematic curves or patterns indicate that the linear assumption may be violated.

8 What does a coefficient mean in linear regression? âš¡ Beginner

Answer: A coefficient indicates the change in the predicted target for a oneâ€‘unit increase in that feature, holding other features constant.

9 Why might you standardize features before linear regression? ðŸ“Š Intermediate

Answer: Standardization can improve numerical stability, interpretability of regularized models, and convergence of gradient-based solvers.

10 What is the difference between simple and multiple linear regression? âš¡ Beginner

Answer: Simple linear regression uses one predictor, while multiple linear regression uses two or more predictors to explain the target.

11 What is the role of residuals in regression analysis? ðŸ“Š Intermediate

Answer: Residuals (errors) are the difference between observed and predicted values; analyzing them helps check assumptions and model fit.

12 What is heteroscedasticity? ðŸ”¥ Advanced

Answer: Heteroscedasticity occurs when the variance of residuals is not constant across different levels of the predictors, violating a key OLS assumption.

13 How can you handle heteroscedasticity? ðŸ”¥ Advanced

Answer: Options include transforming the target (e.g., log), using weighted least squares, or switching to models robust to varying variance.

14 What is gradient descent in the context of linear regression? ðŸ“Š Intermediate

Answer: Gradient descent iteratively updates the coefficients in the direction that reduces the MSE cost, rather than solving for them in one closedâ€‘form step.

15 How do regularized linear models differ from plain OLS? ðŸ”¥ Advanced

Answer: Regularized models (e.g., Ridge/Lasso) add a penalty on coefficient size to the loss function, which can reduce overfitting and handle multicollinearity better.

16 Why might RÂ² increase when you add more features, even useless ones? ðŸ”¥ Advanced

Answer: Plain RÂ² never decreases when you add features because the model can always fit training data at least as well, even if the new features are noise.

17 What is adjusted RÂ² and why is it preferred sometimes? ðŸ“Š Intermediate

Answer: Adjusted RÂ² penalizes adding features that do not improve the model; it can decrease when unnecessary predictors are added, making it more suitable for model comparison.

18 How do you check if residuals are approximately normal? ðŸ”¥ Advanced

Answer: You can use histograms, Qâ€‘Q plots or statistical tests (e.g., Shapiroâ€‘Wilk) to visually and numerically assess residual normality.

19 When might linear regression be a bad choice? ðŸ“Š Intermediate

Answer: It is a poor choice when the relationship is strongly nonâ€‘linear, heavily interactionâ€‘driven, or when assumptions like homoscedasticity are badly violated.

20 Why is linear regression still important to learn? âš¡ Beginner

Answer: Linear regression is simple, interpretable and fast; it builds intuition for more complex models and remains a strong baseline in many practical problems.

Quick Recap: Linear Regression

Understand what the coefficients mean, when assumptions hold, and how to read residual plotsâ€”you will then be able to defend or reject linear regression confidently in interviews.

Back: Evaluation Q&A Next: Polynomial Regression Q&A

Related Machine Learning Links

Linear Regression: Interview Q&A

Quick Recap: Linear Regression