Related Neural Networks Links
Learn Practice Exercises Neural Networks Tutorial, validate concepts with Practice Exercises Neural Networks MCQ Questions, and prepare interviews through Practice Exercises Neural Networks Interview Questions and Answers.
NN Programming Practice
Every item on this page is a coding task—implement it in Python (prefer PyTorch; use NumPy where noted). Use a notebook or .py file, print tensor shapes, and assert expected behavior. For theory-only review, use the MCQ and interview Q&A pages. Match each block to the NN tutorial when you need context.
PyTorch training loops tensor shapes
Neural Networks — Topic-wise Programming Practice
Each block aligns with the NN tutorial sidebar. Complete the code tasks in order; commit snippets to a repo or notebook so you can reuse them as templates.
Introduction & Perceptron
Review: What are NNs? · Perceptron
Write a function forward(x, w, b, activation) where x is shape (d,), w shape (d,), b scalar, and activation is "sigmoid" or "relu" on the pre-activation z = w @ x + b. Unit-test on a fixed random seed against torch.sigmoid / F.relu.
Implement the perceptron update rule in NumPy for logical AND on inputs in {0,1}². Loop until all points classified; print w, b each epoch. Then try XOR with a single layer and verify it never converges (log MSE or error count).
MLP & Activation Functions
Review: MLP · Activations
nn.Sequential MLP for flat vectorsBuild torch.nn.Sequential with Linear(784, 256), ReLU, Linear(256, 128), ReLU, Linear(128, 10). Pass a batch x of shape (32, 784); print output shape. Count parameters with sum(p.numel() for p in model.parameters()).
Vectorized NumPy: implement sigmoid, relu, and softmax (stable: subtract row max). Compare max absolute difference to torch on a random (4, 5) tensor (convert with torch.from_numpy).
Forward Propagation & Loss
Review: Forward propagation · Loss functions
Register a forward_hook on each Linear (or print inside a custom forward) for a small MLP with batch B=16, input dim 20. Log tensor shape after each layer. Repeat with B=1 and confirm batch dimension behavior.
MSELoss vs CrossEntropyLoss vs BCEWithLogitsLossCreate toy tensors: regression targets (B, 1), multi-class logits (B, C) with integer labels (B,), binary logits (B, 1) with float labels (B, 1). Instantiate the three losses and call .forward; print scalar losses. Trigger a deliberate shape/dtype error and fix it.
Gradient Descent & Backpropagation
Review: Gradient descent · Backpropagation
Minimal example: model, optimizer = torch.optim.SGD(model.parameters(), lr=0.1), criterion, one batch x, y. Implement optimizer.zero_grad(), loss = criterion(model(x), y), loss.backward(), optimizer.step(). Assert loss.item() is finite; print model.layer.weight.grad.norm() before/after zero_grad.
Forward a batch where the pre-ReLU values are all negative. After backward(), show that gradients for that layer’s input are zero. Compare with a batch that has mixed positive/negative pre-activations.
Computational Graphs
Review: Computational graphs
L = (a * b + c)²Create a, b, c as torch.tensor(..., requires_grad=True) with values 2, 3, 1. Compute L, call L.backward(), print a.grad, b.grad, c.grad. Derive the chain rule on paper and assert they match (expected: 42, 28, 14 for a, b, c at those values).
torch.autograd.grad vs backwardSame graph: use torch.autograd.grad(L, [a, b, c], retain_graph=True) and confirm gradients match .backward(). Zero grads and repeat with create_graph=True on a simpler L = a**2 and compute second derivative w.r.t. a.
Design, Initialization & Batch Norm
Review: Network design · Weight init · Batch norm
Write two nn.Sequential MLPs on the same synthetic regression task (e.g. y = sin(xâ‚€)+0.1*noise) with roughly equal parameter count: one wider/shallower, one narrower/deeper. Train both for fixed epochs; log final train MSE in a table.
xavier_uniform_ vs kaiming_uniform_ + BatchNorm1dClone the same 3-layer MLP twice; apply nn.init.xavier_uniform_ on one and kaiming_uniform_ on the other. Print pre-activation std after first forward. Add nn.BatchNorm1d after hidden layers; demonstrate model.train() vs model.eval() output difference on a batch of size 1.
Overfitting & Dropout
Review: Overfitting · Dropout
Use ≤50 samples from MNIST or random synthetic data; build a large MLP; train until train accuracy ≈100%. Record validation accuracy each epoch in a Python list and print the gap. No plotting required—just numbers.
nn.Dropout train vs evalFix input x; forward the same x ten times in train() with p=0.5 and show output variance. Switch to eval() and show outputs are identical across runs. Optionally compare to manual F.dropout(x, p=0.5, training=True).
Optimizers, Learning Rate & Vanishing Gradients
Review: Optimizers · Learning rate · Vanishing / exploding
Train identical model/data for 20 epochs twice: torch.optim.SGD(..., momentum=0.9) vs torch.optim.Adam. Log loss each epoch; print final weights’ L2 norm for both runs. Use the same seed and dataloader order.
StepLR or CosineAnnealingLRWrap your optimizer in torch.optim.lr_scheduler.StepLR (or cosine). Print scheduler.get_last_lr() every epoch. Build a 5-layer MLP with Tanh and show vanishing first-layer .grad.norm(); repeat after swapping hidden activations to ReLU.
CNN, RNN, Attention & Transfer Learning
Review: CNN · RNN · Attention · Transfer learning
Conv2d output shape in PyTorchx = torch.randn(8, 3, 32, 32); stack Conv2d(3,64,kernel_size=3,padding=1), ReLU, MaxPool2d(2). Print y.shape after each layer. Repeat with stride=2 conv instead of pool and compare shapes.
LSTM + MultiheadAttention toy batchTensor (B, T, D) = (4, 10, 32): run through nn.LSTM(D, H, batch_first=True) and print last hidden shape. Same tensor: treat as sequence length 10, use nn.MultiheadAttention(embed_dim=32, num_heads=4, batch_first=True) with self-attn (query=key=value); print output shape.
Load torchvision.models.resnet18(weights=DEFAULT); replace fc for 5 classes; freeze all parameters except fc. Run one optimizer step and assert only fc.weight.grad is non-None (others None or zero as expected).
Evaluation Metrics & Frameworks
Review: Metrics · PyTorch · TensorFlow / Keras
Given random logits (N, C) and labels (N,), compute accuracy with argmax. For binary logits (N, 1), compute precision/recall at threshold 0 using vectorized boolean masks (no sklearn required). Compare your numbers to sklearn.metrics if installed.
Implement the same 2-hidden-layer classifier in PyTorch (explicit training loop with zero_grad) and in TensorFlow/Keras (model.compile(optimizer='adam', loss='sparse_categorical_crossentropy'), model.fit). Train 3 epochs on identical numpy X, y; print final loss from both (within reason if seeds differ).
Programming: Shapes, Loss APIs & Device
- Write a script:
x = torch.randn(32, 784);nn.Linear(784, 256)(x)→ print shape; chain second Linear to 10 classes. Assert final shape is(32, 10). - Programmatically verify Conv2d output: build layer
nn.Conv2d(3, 64, 3, padding=1), input(4, 3, 32, 32), print.shape; compare to formulafloor((W+2p-k)/s)+1for height/width. - Demonstrate in code:
CrossEntropyLosson logits + integer labels vsNLLLossonlog_softmax(dim=1)of the same logits—losses should match (within float tolerance). - Optional: move the same two-line model to
cudaif available; catch and print a clear error if tensors stay on CPU.
Programming: Debug Common Training Bugs
Fix or intentionally reproduce each bug in a minimal script, then correct it.
- Forgot
zero_grad: run two backward steps without zeroing—watch gradients explode or double; addoptimizer.zero_grad(set_to_none=True)and stabilize. - Wrong loss input: pass softmax probabilities into
CrossEntropyLoss(should be logits); switch to raw logits and confirm loss decreases. - Eval forgotten: run validation with
model.train()andDropouton—then callmodel.eval()andtorch.no_grad(); compare metric.
Weekly coding rhythm (example)
Mon: finish 2 topic-wise code cards + git commit
Wed: one small dataset + train/val split + log metrics dict
Fri: refactor into nn.Module + argparse or config dict
Weekend: ablation branch — e.g. with/without BatchNorm, same seed
Summary
- This page is programming-only: PyTorch/NumPy implementations, training steps, shape checks, and debug drills.
- Use the sidebar to open the matching tutorial when an API confuses you; use MCQ / interview pages for non-code review.
- Next: real-life examples of neural nets in production.
Close the series with industry use cases on the real-life examples page—still helpful for interviews after you can ship code.