Machine Learning Concepts
Core Theory Python Examples

Core Machine Learning Concepts

Understand what machine learning is, types of ML, the basic pipeline, and key terminology, with a simple linear regression example in Python.

What is Machine Learning?

Machine Learning (ML) is about teaching computers to learn patterns from data instead of programming them with hard-coded rules. A model learns a mapping from inputs X to an output y.

Types of Machine Learning

Supervised Learning

Data has both inputs and labels.

  • Regression: predict a number (price, temperature).
  • Classification: predict a class (spam vs non-spam).
Unsupervised Learning

Data has inputs only, no labels.

  • Clustering: group similar items/customers.
  • Dimensionality reduction: compress features.
Reinforcement Learning

An agent learns by trial and error to maximize rewards (e.g., game playing, robotics).

Supervised Learning Pipeline

  1. Collect and clean data.
  2. Split data into train and test sets.
  3. Choose a model (e.g., Linear Regression).
  4. Train the model on training data.
  5. Evaluate performance on test data.
  6. Improve by tuning or choosing a better model.

Example: Linear Regression in scikit-learn

Linear regression tries to fit a straight line that best describes the relationship between a numeric input (e.g., house size) and a numeric output (e.g., price).

Predict House Price from Size
import numpy as np
from sklearn.linear_model import LinearRegression

# Features (X): house sizes in square feet
# Must be 2D: each row is one example, each column a feature
X = np.array([[500], [750], [1000], [1250], [1500]])

# Target (y): house prices in thousands of dollars
y = np.array([100, 150, 200, 250, 300])

# Create the model
model = LinearRegression()

# Train the model on the data
model.fit(X, y)

# Predict price for a 1200 sq ft house
new_size = np.array([[1200]])
predicted_price = model.predict(new_size)

print("Predicted price (in thousands):", predicted_price[0])

# View learned parameters (slope and intercept)
print("Weight (slope):", model.coef_[0])
print("Bias (intercept):", model.intercept_)

Key Terms

  • Feature: input variable (e.g., size, rooms, location).
  • Label / Target: what we want to predict (e.g., price).
  • Model: function that maps features to a prediction.
  • Parameters: internal values learned by the model (weights).
  • Overfitting: model memorizes training data, performs poorly on new data.
  • Underfitting: model is too simple and misses important patterns.