Related Deep Learning Links
Learn Neural Networks Deep Learning Tutorial, validate concepts with Neural Networks Deep Learning MCQ Questions, and prepare interviews through Neural Networks Deep Learning Interview Questions and Answers.
Neural Networks Basics
Understand the neuron: from the perceptron algorithm to multi-layer networks and backpropagation — with clean Python implementations.
Perceptron
Building block
Forward/Backward
Chain rule
Activation
Sigmoid, ReLU
NumPy
from scratch
The Perceptron — First Neural Model
Invented by Frank Rosenblatt in 1958, the perceptron is the simplest neural network: a single neuron that classifies linear separable patterns.
How it works
- 1 Weighted sum:
z = w·x + b - 2 Step activation: 1 if z ≥ 0 else 0
- 3 Update:
w = w + lr*(y - ŷ)*x
Limitation
Only linear separable functions (AND, OR) – cannot learn XOR. This triggered the first AI winter and led to multi-layer networks.
key insight: depth mattersimport numpy as np
class Perceptron:
def __init__(self, lr=0.01, epochs=15):
self.lr = lr
self.epochs = epochs
self.weights = None
self.bias = None
def activation(self, z):
return 1 if z >= 0 else 0
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
for _ in range(self.epochs):
for idx, x_i in enumerate(X):
linear = np.dot(x_i, self.weights) + self.bias
y_pred = self.activation(linear)
update = self.lr * (y[idx] - y_pred)
self.weights += update * x_i
self.bias += update
def predict(self, X):
linear = np.dot(X, self.weights) + self.bias
return np.array([self.activation(z) for z in linear])
Try it on AND gate – converges in <10 iterations.
Activation Functions: Non-linearity is key
Without activation functions, stacked linear layers collapse into one linear transformation. Non-linear activations enable deep networks to approximate any function.
Sigmoid
def sigmoid(x):
return 1/(1+np.exp(-x))
Range (0,1), great for binary output, but vanishing gradient.
Tanh
def tanh(x):
return np.tanh(x)
Range (-1,1), zero-centered, stronger gradients.
ReLU
def relu(x):
return np.maximum(0,x)
No saturation, sparse; dead neurons risk.
Leaky ReLU
def leaky_relu(x, alpha=0.1):
return np.where(x>0, x, alpha*x)
Softmax
def softmax(x):
ex = np.exp(x - np.max(x))
return ex / ex.sum()
Multi-class probability.
Forward Propagation & Backpropagation
Forward pass
Compute activations layer by layer, cache intermediate values for gradient.
Backward pass (chain rule)
δL/δW = (δL/δa) * (δa/δz) * (δz/δW)
# assume sigmoid activation, MSE loss
def backward(self, X, y, a1, a2):
m = X.shape[0]
# output layer gradient
dz2 = a2 - y.reshape(-1,1) # dL/dz2
dW2 = (1/m) * a1.T @ dz2
db2 = (1/m) * np.sum(dz2, axis=0, keepdims=True)
# hidden layer gradient
da1 = dz2 @ self.W2.T
dz1 = da1 * (a1 * (1 - a1)) # sigmoid derivative
dW1 = (1/m) * X.T @ dz1
db1 = (1/m) * np.sum(dz1, axis=0, keepdims=True)
Multi-Layer Perceptron (MLP) from Scratch
Complete implementation of a flexible neural network with one hidden layer using only NumPy. Foundation for modern deep learning.
import numpy as np
class MLP:
def __init__(self, input_size, hidden_size, output_size, lr=0.1):
self.lr = lr
self.W1 = np.random.randn(input_size, hidden_size) * 0.5
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, output_size) * 0.5
self.b2 = np.zeros((1, output_size))
def sigmoid(self, x):
return 1 / (1 + np.exp(-x))
def sigmoid_deriv(self, x):
return x * (1 - x)
def forward(self, X):
self.z1 = X @ self.W1 + self.b1
self.a1 = self.sigmoid(self.z1)
self.z2 = self.a1 @ self.W2 + self.b2
self.a2 = self.sigmoid(self.z2)
return self.a2
def backward(self, X, y, output):
m = X.shape[0]
self.dz2 = output - y.reshape(-1,1)
self.dW2 = (1/m) * self.a1.T @ self.dz2
self.db2 = (1/m) * np.sum(self.dz2, axis=0, keepdims=True)
self.da1 = self.dz2 @ self.W2.T
self.dz1 = self.da1 * self.sigmoid_deriv(self.a1)
self.dW1 = (1/m) * X.T @ self.dz1
self.db1 = (1/m) * np.sum(self.dz1, axis=0, keepdims=True)
def update(self):
self.W1 -= self.lr * self.dW1
self.b1 -= self.lr * self.db1
self.W2 -= self.lr * self.dW2
self.b2 -= self.lr * self.db2
def fit(self, X, y, epochs=1000):
for i in range(epochs):
output = self.forward(X)
self.backward(X, y, output)
self.update()
if i % 200 == 0:
loss = np.mean((output - y)**2)
print(f"epoch {i}, loss: {loss:.6f}")
Neural Nets in Keras & PyTorch
TensorFlow/Keras
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(4,)),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='mse')
PyTorch
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(4, 8)
self.fc2 = nn.Linear(8, 4)
self.out = nn.Linear(4, 1)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
return torch.sigmoid(self.out(x))
transfer learning autodiff GPU
Weight Initialization & Optimizers
Initialization
- Zero init → symmetry, no learning
- Small random (0.01) – ok for shallow
- Xavier/Glorot for sigmoid/tanh
- He init for ReLU
Optimizers
Batch GD, SGD, Mini-batch. Momentum, Adam, RMSprop adapt learning rates.
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
Why do neural networks work?
Universal Approximation Theorem: A feedforward network with a single hidden layer can approximate any continuous function, given sufficient neurons and non-linear activation.
Real‑world usage
Regression & Forecasting
Housing prices, stock trends, energy load.
Classification
Spam detection, credit risk, medical diagnosis.
Feature learning
Autoencoders, embeddings, representation learning.