Related Machine Learning Links
Learn Sklearn Machine Learning Tutorial, validate concepts with Sklearn Machine Learning MCQ Questions, and prepare interviews through Sklearn Machine Learning Interview Questions and Answers.
Scikit-Learn Q&A
20 Core Questions
Interview Prep
Scikit-Learn: Interview Q&A
Short questions and answers on using scikit-learn for practical machine learning in Python.
Estimators
Pipelines
CV & Search
Preprocessing
1
What is scikit-learn and when would you use it?
âš¡ Beginner
Answer: Scikit-learn is a popular Python library for classical ML (trees, SVMs, linear models, clustering, preprocessing) on tabular data.
2
What is the common estimator API pattern in sklearn?
âš¡ Beginner
Answer: Estimators follow the fit / predict / transform pattern, sometimes with fit_transform and score.
3
What is a transformer vs an estimator in sklearn?
📊 Intermediate
Answer: Transformers implement transform (e.g., scaling, encoding); estimators implement predict (models) or both in some cases.
4
Why are pipelines useful in scikit-learn?
📊 Intermediate
Answer: Pipelines chain preprocessing and modeling steps so you can fit and cross-validate the whole workflow safely without leakage.
5
What is ColumnTransformer and when would you use it?
📊 Intermediate
Answer: ColumnTransformer applies different transformers to different columns (e.g., scale numerics, one-hot encode categoricals) in a single pipeline.
6
How do you perform cross-validation in sklearn?
âš¡ Beginner
Answer: Use helpers like cross_val_score, cross_validate or pass a CV splitter to GridSearchCV / RandomizedSearchCV.
7
What is GridSearchCV and why is it useful?
âš¡ Beginner
Answer: GridSearchCV exhaustively tests parameter combinations using CV, providing best params and a tuned estimator.
8
When would you prefer RandomizedSearchCV over GridSearchCV?
📊 Intermediate
Answer: When the parameter space is large; RandomizedSearchCV samples combinations and is usually more efficient.
9
How do you handle class imbalance in sklearn classifiers?
📊 Intermediate
Answer: Use class_weight='balanced' (where supported), resample with imbalanced-learn, or adjust thresholds/metrics.
10
Why should preprocessing be inside the pipeline rather than done beforehand?
🔥 Advanced
Answer: Putting preprocessing in the pipeline ensures it is fit only on training folds during CV, preventing data leakage.
11
How do you save and load trained sklearn models?
âš¡ Beginner
Answer: Typically using joblib.dump and joblib.load (or pickle with care).
12
What are some key preprocessing utilities in sklearn?
âš¡ Beginner
Answer: Important transformers: StandardScaler, MinMaxScaler, OneHotEncoder, OrdinalEncoder, SimpleImputer, PolynomialFeatures.
13
How do you access model coefficients or feature importances in sklearn?
📊 Intermediate
Answer: Many linear models expose coef_, tree-based models expose feature_importances_.
14
What is the purpose of the random_state parameter?
âš¡ Beginner
Answer: random_state controls random number generation for reproducibility of model training and splits.
15
How do you create a custom transformer in sklearn?
🔥 Advanced
Answer: Subclass BaseEstimator and TransformerMixin, implement fit (often returning self) and transform.
16
How do you handle time series with sklearn to avoid leakage?
🔥 Advanced
Answer: Use TimeSeriesSplit or custom CV, create lag features, and ensure all transforms use only past data in each fold.
17
When would you choose sklearn over deep learning frameworks?
📊 Intermediate
Answer: For tabular data, smaller datasets, quicker iteration and simpler deployment, sklearn models are often the best choice.
18
Give an example of a full sklearn workflow from raw data to model.
🔥 Advanced
Answer: Typical flow: train_test_split → ColumnTransformer (impute+scale/encode) → Pipeline with model → cross-validation / GridSearchCV → fit best model → evaluate on test.
19
What are some common mistakes when using sklearn?
🔥 Advanced
Answer: Common mistakes: data leakage from preprocessing outside pipelines, improper CV, not scaling when needed, ignoring class imbalance.
20
What is the key message to remember about scikit-learn?
âš¡ Beginner
Answer: Scikit-learn provides a clean, consistent API; mastering estimators, pipelines and CV lets you build robust ML workflows quickly.
Quick Recap: Scikit-Learn
Think in terms of transformers + estimators + pipelines; this mindset helps you structure nearly any classical ML project in sklearn.