Deep Learning Tutorial
Master Deep Learning from fundamentals of neural networks to advanced architectures like CNNs, RNNs, Transformers, and GANs with practical implementations in TensorFlow and PyTorch.
Neural Networks
From perceptrons to deep nets
Computer Vision
CNN, YOLO, ResNet
NLP
RNN, LSTM, Transformers
Generative AI
GANs, VAEs
Introduction to Deep Learning
Deep Learning is a subset of machine learning that uses neural networks with many layers (deep neural networks) to model complex patterns in data. Inspired by the structure and function of the human brain, deep learning has revolutionized fields like computer vision, natural language processing, and generative AI.
Evolution of Deep Learning
- 1943: First neural network model (McCulloch-Pitts)
- 1958: Perceptron (Rosenblatt)
- 1986: Backpropagation (Rumelhart, Hinton)
- 2012: AlexNet wins ImageNet (Modern DL era)
- 2017: Transformer architecture (Vaswani et al.)
- 2020+: GPT, DALL-E, Generative AI boom
Why Deep Learning?
- Automatic feature extraction
- Outperforms traditional ML on large datasets
- State-of-the-art in vision, language, speech
- Transfer learning & pre-trained models
- Backed by industry (Google, Meta, OpenAI)
First Neural Network: Perceptron
A perceptron is the simplest form of a neural network - a single neuron that makes decisions by weighing inputs.
import numpy as np
class Perceptron:
def __init__(self, learning_rate=0.01, epochs=100):
self.lr = learning_rate
self.epochs = epochs
self.weights = None
self.bias = None
def activation(self, x):
# Step function
return 1 if x >= 0 else 0
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
for _ in range(self.epochs):
for idx, x_i in enumerate(X):
linear_output = np.dot(x_i, self.weights) + self.bias
y_predicted = self.activation(linear_output)
# Update weights and bias
update = self.lr * (y[idx] - y_predicted)
self.weights += update * x_i
self.bias += update
def predict(self, X):
linear_output = np.dot(X, self.weights) + self.bias
return np.array([self.activation(x) for x in linear_output])
Neural Networks Fundamentals
At its core, a neural network consists of layers of interconnected neurons. Each connection has a weight, and each neuron has an activation function that determines its output.
Simple Feed-Forward Neural Network Architecture
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
class NeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
# Initialize weights and biases
self.W1 = np.random.randn(input_size, hidden_size) * 0.5
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, output_size) * 0.5
self.b2 = np.zeros((1, output_size))
def forward(self, X):
# Forward propagation
self.z1 = np.dot(X, self.W1) + self.b1
self.a1 = sigmoid(self.z1)
self.z2 = np.dot(self.a1, self.W2) + self.b2
self.a2 = sigmoid(self.z2)
return self.a2
def backward(self, X, y, output):
# Backpropagation
m = X.shape[0]
# Output layer error
self.dz2 = output - y
self.dW2 = (1/m) * np.dot(self.a1.T, self.dz2)
self.db2 = (1/m) * np.sum(self.dz2, axis=0, keepdims=True)
# Hidden layer error
self.da1 = np.dot(self.dz2, self.W2.T)
self.dz1 = self.da1 * sigmoid_derivative(self.a1)
self.dW1 = (1/m) * np.dot(X.T, self.dz1)
self.db1 = (1/m) * np.sum(self.dz1, axis=0, keepdims=True)
def update(self, lr=0.1):
# Gradient descent
self.W1 -= lr * self.dW1
self.b1 -= lr * self.db1
self.W2 -= lr * self.dW2
self.b2 -= lr * self.db2
Activation Functions
Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns.
Sigmoid
def sigmoid(x):
return 1/(1+np.exp(-x))
Range: (0,1)
Best for: Binary classification output
Tanh
def tanh(x):
return np.tanh(x)
Range: (-1,1)
Best for: Hidden layers (zero-centered)
ReLU
def relu(x):
return np.maximum(0,x)
Range: [0,∞)
Best for: Most hidden layers
Deep Learning Frameworks
TensorFlow / Keras
High-level API for quick prototyping
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
PyTorch
Dynamic computation graphs, research-focused
import torch
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 64)
self.fc3 = nn.Linear(64, 10)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
return self.fc3(x)
Convolutional Neural Networks (CNNs)
CNNs are designed to process grid-like data such as images. They use convolutional layers, pooling layers, and fully connected layers.
import tensorflow as tf
model = tf.keras.Sequential([
# Convolutional Block 1
tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)),
tf.keras.layers.MaxPooling2D(2,2),
# Convolutional Block 2
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Convolutional Block 3
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Classifier
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(10, activation='softmax')
])
model.summary()
Recurrent Neural Networks (RNNs) & LSTMs
RNNs are designed for sequential data. LSTMs solve the vanishing gradient problem and capture long-term dependencies.
model = tf.keras.Sequential([
tf.keras.layers.Embedding(10000, 128, input_length=100),
tf.keras.layers.LSTM(64, return_sequences=True),
tf.keras.layers.LSTM(32),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
Transformers & Attention
The Transformer architecture uses self-attention mechanisms and has become the foundation of modern NLP (BERT, GPT).
Attention Mechanism
import numpy as np
def scaled_dot_product_attention(Q, K, V):
"""Attention(Q,K,V) = softmax(QK^T/√d_k)V"""
d_k = K.shape[-1]
scores = np.matmul(Q, K.transpose(0,1,3,2)) / np.sqrt(d_k)
attention_weights = tf.nn.softmax(scores, axis=-1)
output = np.matmul(attention_weights, V)
return output, attention_weights
Generative Adversarial Networks (GANs)
GANs consist of a generator and a discriminator that compete against each other, producing realistic synthetic data.
Discriminator: Tries to distinguish real from fake
Training: Min-max game between generator and discriminator
Deep Learning Applications
Computer Vision
- Image Classification
- Object Detection (YOLO, SSD)
- Semantic Segmentation
- Face Recognition
Natural Language Processing
- Machine Translation
- Text Summarization
- Sentiment Analysis
- Chatbots & LLMs
Speech & Audio
- Speech Recognition
- Text-to-Speech
- Music Generation
- Speaker Identification
Why Deep Learning?
- Automatic feature engineering - no manual feature extraction
- Scales with data - more data = better performance
- Transfer learning - leverage pre-trained models
- State-of-the-art results across vision, language, speech
- Versatile architectures for any data type
- Massive industry adoption and research momentum