Neural Networks Practice
PyTorch / NumPy Code

NN Programming Practice

Every item on this page is a coding task—implement it in Python (prefer PyTorch; use NumPy where noted). Use a notebook or .py file, print tensor shapes, and assert expected behavior. For theory-only review, use the MCQ and interview Q&A pages. Match each block to the NN tutorial when you need context.

PyTorch training loops tensor shapes

Neural Networks — Topic-wise Programming Practice

Each block aligns with the NN tutorial sidebar. Complete the code tasks in order; commit snippets to a repo or notebook so you can reuse them as templates.

Introduction & Perceptron

Review: What are NNs? · Perceptron

Code — Single neuron forward (NumPy or PyTorch)
Topic: IntroLevel: Easy

Write a function forward(x, w, b, activation) where x is shape (d,), w shape (d,), b scalar, and activation is "sigmoid" or "relu" on the pre-activation z = w @ x + b. Unit-test on a fixed random seed against torch.sigmoid / F.relu.

Code — 2D perceptron training loop
Topic: PerceptronLevel: Easy

Implement the perceptron update rule in NumPy for logical AND on inputs in {0,1}². Loop until all points classified; print w, b each epoch. Then try XOR with a single layer and verify it never converges (log MSE or error count).

MLP & Activation Functions

Review: MLP · Activations

Code — nn.Sequential MLP for flat vectors
Topic: MLPLevel: Easy

Build torch.nn.Sequential with Linear(784, 256), ReLU, Linear(256, 128), ReLU, Linear(128, 10). Pass a batch x of shape (32, 784); print output shape. Count parameters with sum(p.numel() for p in model.parameters()).

Code — Activations in NumPy vs PyTorch
Topic: ActivationsLevel: Easy

Vectorized NumPy: implement sigmoid, relu, and softmax (stable: subtract row max). Compare max absolute difference to torch on a random (4, 5) tensor (convert with torch.from_numpy).

Forward Propagation & Loss

Review: Forward propagation · Loss functions

Code — Trace shapes through an MLP
Topic: Forward passLevel: Easy

Register a forward_hook on each Linear (or print inside a custom forward) for a small MLP with batch B=16, input dim 20. Log tensor shape after each layer. Repeat with B=1 and confirm batch dimension behavior.

Code — MSELoss vs CrossEntropyLoss vs BCEWithLogitsLoss
Topic: LossLevel: Easy

Create toy tensors: regression targets (B, 1), multi-class logits (B, C) with integer labels (B,), binary logits (B, 1) with float labels (B, 1). Instantiate the three losses and call .forward; print scalar losses. Trigger a deliberate shape/dtype error and fix it.

Gradient Descent & Backpropagation

Review: Gradient descent · Backpropagation

Code — One full training step
Topic: OptimizationLevel: Easy

Minimal example: model, optimizer = torch.optim.SGD(model.parameters(), lr=0.1), criterion, one batch x, y. Implement optimizer.zero_grad(), loss = criterion(model(x), y), loss.backward(), optimizer.step(). Assert loss.item() is finite; print model.layer.weight.grad.norm() before/after zero_grad.

Code — Inspect gradients on dead ReLU
Topic: BackpropLevel: Intermediate

Forward a batch where the pre-ReLU values are all negative. After backward(), show that gradients for that layer’s input are zero. Compare with a batch that has mixed positive/negative pre-activations.

Computational Graphs

Review: Computational graphs

Code — Autograd on L = (a * b + c)²
Topic: AutodiffLevel: Easy

Create a, b, c as torch.tensor(..., requires_grad=True) with values 2, 3, 1. Compute L, call L.backward(), print a.grad, b.grad, c.grad. Derive the chain rule on paper and assert they match (expected: 42, 28, 14 for a, b, c at those values).

Code — torch.autograd.grad vs backward
Topic: AutodiffLevel: Intermediate

Same graph: use torch.autograd.grad(L, [a, b, c], retain_graph=True) and confirm gradients match .backward(). Zero grads and repeat with create_graph=True on a simpler L = a**2 and compute second derivative w.r.t. a.

Design, Initialization & Batch Norm

Review: Network design · Weight init · Batch norm

Code — Widen vs deepen (parameter-matched)
Topic: DesignLevel: Intermediate

Write two nn.Sequential MLPs on the same synthetic regression task (e.g. y = sin(xâ‚€)+0.1*noise) with roughly equal parameter count: one wider/shallower, one narrower/deeper. Train both for fixed epochs; log final train MSE in a table.

Code — xavier_uniform_ vs kaiming_uniform_ + BatchNorm1d
Topic: Init / BNLevel: Intermediate

Clone the same 3-layer MLP twice; apply nn.init.xavier_uniform_ on one and kaiming_uniform_ on the other. Print pre-activation std after first forward. Add nn.BatchNorm1d after hidden layers; demonstrate model.train() vs model.eval() output difference on a batch of size 1.

Overfitting & Dropout

Review: Overfitting · Dropout

Code — Intentionally overfit a tiny dataset
Topic: GeneralizationLevel: Easy

Use ≤50 samples from MNIST or random synthetic data; build a large MLP; train until train accuracy ≈100%. Record validation accuracy each epoch in a Python list and print the gap. No plotting required—just numbers.

Code — nn.Dropout train vs eval
Topic: DropoutLevel: Easy

Fix input x; forward the same x ten times in train() with p=0.5 and show output variance. Switch to eval() and show outputs are identical across runs. Optionally compare to manual F.dropout(x, p=0.5, training=True).

Optimizers, Learning Rate & Vanishing Gradients

Review: Optimizers · Learning rate · Vanishing / exploding

Code — SGD vs Adam on the same script
Topic: OptimizersLevel: Easy

Train identical model/data for 20 epochs twice: torch.optim.SGD(..., momentum=0.9) vs torch.optim.Adam. Log loss each epoch; print final weights’ L2 norm for both runs. Use the same seed and dataloader order.

Code — StepLR or CosineAnnealingLR
Topic: LR / gradientsLevel: Intermediate

Wrap your optimizer in torch.optim.lr_scheduler.StepLR (or cosine). Print scheduler.get_last_lr() every epoch. Build a 5-layer MLP with Tanh and show vanishing first-layer .grad.norm(); repeat after swapping hidden activations to ReLU.

CNN, RNN, Attention & Transfer Learning

Review: CNN · RNN · Attention · Transfer learning

Code — Conv2d output shape in PyTorch
Topic: CNNLevel: Easy

x = torch.randn(8, 3, 32, 32); stack Conv2d(3,64,kernel_size=3,padding=1), ReLU, MaxPool2d(2). Print y.shape after each layer. Repeat with stride=2 conv instead of pool and compare shapes.

Code — LSTM + MultiheadAttention toy batch
Topic: RNN / attentionLevel: Intermediate

Tensor (B, T, D) = (4, 10, 32): run through nn.LSTM(D, H, batch_first=True) and print last hidden shape. Same tensor: treat as sequence length 10, use nn.MultiheadAttention(embed_dim=32, num_heads=4, batch_first=True) with self-attn (query=key=value); print output shape.

Code — Freeze backbone, train new head
Topic: TransferLevel: Intermediate

Load torchvision.models.resnet18(weights=DEFAULT); replace fc for 5 classes; freeze all parameters except fc. Run one optimizer step and assert only fc.weight.grad is non-None (others None or zero as expected).

Evaluation Metrics & Frameworks

Review: Metrics · PyTorch · TensorFlow / Keras

Code — Accuracy, precision, recall from logits
Topic: MetricsLevel: Easy

Given random logits (N, C) and labels (N,), compute accuracy with argmax. For binary logits (N, 1), compute precision/recall at threshold 0 using vectorized boolean masks (no sklearn required). Compare your numbers to sklearn.metrics if installed.

Code — Same MLP in PyTorch vs Keras
Topic: PyTorch / TFLevel: Intermediate

Implement the same 2-hidden-layer classifier in PyTorch (explicit training loop with zero_grad) and in TensorFlow/Keras (model.compile(optimizer='adam', loss='sparse_categorical_crossentropy'), model.fit). Train 3 epochs on identical numpy X, y; print final loss from both (within reason if seeds differ).

Tip: Theory-only review stays on MCQs and interview Q&A; this page is code-only.

Programming: Shapes, Loss APIs & Device

  1. Write a script: x = torch.randn(32, 784); nn.Linear(784, 256)(x) → print shape; chain second Linear to 10 classes. Assert final shape is (32, 10).
  2. Programmatically verify Conv2d output: build layer nn.Conv2d(3, 64, 3, padding=1), input (4, 3, 32, 32), print .shape; compare to formula floor((W+2p-k)/s)+1 for height/width.
  3. Demonstrate in code: CrossEntropyLoss on logits + integer labels vs NLLLoss on log_softmax(dim=1) of the same logits—losses should match (within float tolerance).
  4. Optional: move the same two-line model to cuda if available; catch and print a clear error if tensors stay on CPU.

Programming: Debug Common Training Bugs

Fix or intentionally reproduce each bug in a minimal script, then correct it.

  1. Forgot zero_grad: run two backward steps without zeroing—watch gradients explode or double; add optimizer.zero_grad(set_to_none=True) and stabilize.
  2. Wrong loss input: pass softmax probabilities into CrossEntropyLoss (should be logits); switch to raw logits and confirm loss decreases.
  3. Eval forgotten: run validation with model.train() and Dropout on—then call model.eval() and torch.no_grad(); compare metric.

Weekly coding rhythm (example)

Notebook / repo habits
Mon:  finish 2 topic-wise code cards + git commit
Wed:  one small dataset + train/val split + log metrics dict
Fri:  refactor into nn.Module + argparse or config dict
Weekend: ablation branch — e.g. with/without BatchNorm, same seed

Summary

  • This page is programming-only: PyTorch/NumPy implementations, training steps, shape checks, and debug drills.
  • Use the sidebar to open the matching tutorial when an API confuses you; use MCQ / interview pages for non-code review.
  • Next: real-life examples of neural nets in production.

Close the series with industry use cases on the real-life examples page—still helpful for interviews after you can ship code.