Related Machine Learning Links
Learn Exercises Machine Learning Tutorial, validate concepts with Exercises Machine Learning MCQ Questions, and prepare interviews through Exercises Machine Learning Interview Questions and Answers.
ML Practice
Exercises
Hands-on
Machine Learning Exercises
Use these exercises to reinforce your understanding of ML theory, algorithms and implementation details.
Topic‑1: ML Basics & Theory
- Define supervised, unsupervised and reinforcement learning. For each, give two real‑world examples.
- Explain bias‑variance trade‑off. For three different models (linear, tree, deep net), describe where they typically sit on this spectrum.
- List at least five common sources of data leakage in ML projects and propose a mitigation for each.
- For classification, compare Logistic Regression, k‑NN and Decision Trees in terms of interpretability, training speed and robustness to noise.
Topic‑2: Regression
- On a housing dataset, split into train/validation. Train Linear Regression, Ridge and Lasso; compare RMSE and discuss which features are shrunk or removed.
- Implement gradient descent for univariate Linear Regression in NumPy and verify that the solution matches the closed‑form solution.
- Create polynomial features (degree 2, 3) for a synthetic 1D dataset and show how training/validation error changes with degree and regularization.
Topic‑3: Classification & Metrics
- Write a function that, given y_true and y_pred, computes accuracy, precision, recall, F1‑score and confusion matrix (without using sklearn.metrics).
- On the Titanic or a similar dataset, train at least three classifiers (Logistic Regression, Random Forest, SVM) and compare ROC‑AUC and PR‑AUC.
- Plot ROC and Precision‑Recall curves for an imbalanced dataset and explain when PR‑AUC is more informative than ROC‑AUC.
Topic‑4: Data Preprocessing & Feature Engineering
- Given a mixed‑type tabular dataset, design a preprocessing pipeline that imputes missing values, scales numeric features and encodes categoricals. Implement it with
ColumnTransformer+Pipeline. - Create at least five new domain‑inspired features for the House Price dataset and show how they impact model performance.
- Demonstrate the effect of feature scaling on k‑NN and SVM by training with and without scaling and comparing results.
Topic‑5: Time Series
- Take a univariate time series (e.g., daily sales). Create lag and rolling‑window features and train a tree‑based regressor for one‑step‑ahead forecasting using proper time‑based splits.
- Implement a naive, seasonal naive and simple moving‑average forecast and compare them as baselines against your ML model.
- Perform a train/validation backtest with a rolling window (e.g., 3 folds) and compute MAE/RMSE for each fold.
Topic‑6: NLP
- Build a simple spam/ham SMS classifier using bag‑of‑words + Naive Bayes; then upgrade to TF‑IDF and compare metrics.
- Given a small text corpus, experiment with different tokenization strategies (word, subword, character) and discuss pros/cons.
- Use a pre‑trained transformer (e.g., BERT via Hugging Face) and fine‑tune it for sentiment analysis on a small dataset; measure improvement over classical models.
Topic‑7: Neural Networks & Deep Learning
- Implement a fully‑connected neural network for MNIST using a deep learning framework of your choice; experiment with different activations and regularization (dropout, weight decay).
- Plot training and validation loss curves; identify and fix overfitting using early stopping and data augmentation.
- Re‑implement forward and backward passes for a simple 2‑layer network in pure NumPy to solidify your understanding of backpropagation.
Topic‑8: Pandas, NumPy & Scikit‑Learn
- Using NumPy only, implement standardization and min‑max scaling functions and verify against sklearn’s
StandardScalerandMinMaxScaler. - With pandas, load a raw CSV, perform exploratory analysis (missing values, distributions, correlations) and summarize key data quality issues.
- Build a complete sklearn
Pipeline(preprocessing + model), wrap it inGridSearchCVand report the best configuration and scores.
Topic‑9: Mini Projects & MLOps
- Build a small REST API (FastAPI or Flask) that serves predictions from a trained model, including basic input validation and logging.
- Create a notebook that benchmarks multiple models on the same dataset with clear visualizations and a short written report of conclusions.
- Take an existing Kaggle notebook, refactor it into reusable functions/modules, and add at least two improvements (better features, tuning, or evaluation).