NN Programming Practice

Every item on this page is a coding taskâ€”implement it in Python (prefer PyTorch; use NumPy where noted). Use a notebook or .py file, print tensor shapes, and assert expected behavior. For theory-only review, use the MCQ and interview Q&A pages. Match each block to the NN tutorial when you need context.

PyTorch training loops tensor shapes

Neural Networks â€” Topic-wise Programming Practice

Each block aligns with the NN tutorial sidebar. Complete the code tasks in order; commit snippets to a repo or notebook so you can reuse them as templates.

Intro & perceptron MLP & activations Forward & loss GD & backprop Computational graph Design & init & BN Overfit & dropout Optimizers & LR & gradients CNN / RNN / attention Metrics & frameworks

Introduction & Perceptron

Review: What are NNs? Â· Perceptron

Code â€” Single neuron forward (NumPy or PyTorch)

Topic: IntroLevel: Easy

Write a function forward(x, w, b, activation) where x is shape (d,), w shape (d,), b scalar, and activation is "sigmoid" or "relu" on the pre-activation z = w @ x + b. Unit-test on a fixed random seed against torch.sigmoid / F.relu.

Code â€” 2D perceptron training loop

Topic: PerceptronLevel: Easy

Implement the perceptron update rule in NumPy for logical AND on inputs in {0,1}Â². Loop until all points classified; print w, b each epoch. Then try XOR with a single layer and verify it never converges (log MSE or error count).

MLP & Activation Functions

Review: MLP Â· Activations

Code â€” nn.Sequential MLP for flat vectors

Topic: MLPLevel: Easy

Build torch.nn.Sequential with Linear(784, 256), ReLU, Linear(256, 128), ReLU, Linear(128, 10). Pass a batch x of shape (32, 784); print output shape. Count parameters with sum(p.numel() for p in model.parameters()).

Code â€” Activations in NumPy vs PyTorch

Topic: ActivationsLevel: Easy

Vectorized NumPy: implement sigmoid, relu, and softmax (stable: subtract row max). Compare max absolute difference to torch on a random (4, 5) tensor (convert with torch.from_numpy).

Forward Propagation & Loss

Review: Forward propagation Â· Loss functions

Code â€” Trace shapes through an MLP

Topic: Forward passLevel: Easy

Register a forward_hook on each Linear (or print inside a custom forward) for a small MLP with batch B=16, input dim 20. Log tensor shape after each layer. Repeat with B=1 and confirm batch dimension behavior.

Code â€” MSELoss vs CrossEntropyLoss vs BCEWithLogitsLoss

Topic: LossLevel: Easy

Create toy tensors: regression targets (B, 1), multi-class logits (B, C) with integer labels (B,), binary logits (B, 1) with float labels (B, 1). Instantiate the three losses and call .forward; print scalar losses. Trigger a deliberate shape/dtype error and fix it.

Gradient Descent & Backpropagation

Review: Gradient descent Â· Backpropagation

Code â€” One full training step

Topic: OptimizationLevel: Easy

Minimal example: model, optimizer = torch.optim.SGD(model.parameters(), lr=0.1), criterion, one batch x, y. Implement optimizer.zero_grad(), loss = criterion(model(x), y), loss.backward(), optimizer.step(). Assert loss.item() is finite; print model.layer.weight.grad.norm() before/after zero_grad.

Code â€” Inspect gradients on dead ReLU

Topic: BackpropLevel: Intermediate

Forward a batch where the pre-ReLU values are all negative. After backward(), show that gradients for that layerâ€™s input are zero. Compare with a batch that has mixed positive/negative pre-activations.

Computational Graphs

Review: Computational graphs

Code â€” Autograd on L = (a * b + c)Â²

Topic: AutodiffLevel: Easy

Create a, b, c as torch.tensor(..., requires_grad=True) with values 2, 3, 1. Compute L, call L.backward(), print a.grad, b.grad, c.grad. Derive the chain rule on paper and assert they match (expected: 42, 28, 14 for a, b, c at those values).

Code â€” torch.autograd.grad vs backward

Topic: AutodiffLevel: Intermediate

Same graph: use torch.autograd.grad(L, [a, b, c], retain_graph=True) and confirm gradients match .backward(). Zero grads and repeat with create_graph=True on a simpler L = a**2 and compute second derivative w.r.t. a.

Design, Initialization & Batch Norm

Review: Network design Â· Weight init Â· Batch norm

Code â€” Widen vs deepen (parameter-matched)

Topic: DesignLevel: Intermediate

Write two nn.Sequential MLPs on the same synthetic regression task (e.g. y = sin(xâ‚€)+0.1*noise) with roughly equal parameter count: one wider/shallower, one narrower/deeper. Train both for fixed epochs; log final train MSE in a table.

Code â€” xavier_uniform_ vs kaiming_uniform_ + BatchNorm1d

Topic: Init / BNLevel: Intermediate

Clone the same 3-layer MLP twice; apply nn.init.xavier_uniform_ on one and kaiming_uniform_ on the other. Print pre-activation std after first forward. Add nn.BatchNorm1d after hidden layers; demonstrate model.train() vs model.eval() output difference on a batch of size 1.

Overfitting & Dropout

Review: Overfitting Â· Dropout

Code â€” Intentionally overfit a tiny dataset

Topic: GeneralizationLevel: Easy

Use â‰¤50 samples from MNIST or random synthetic data; build a large MLP; train until train accuracy â‰ˆ100%. Record validation accuracy each epoch in a Python list and print the gap. No plotting requiredâ€”just numbers.

Code â€” nn.Dropout train vs eval

Topic: DropoutLevel: Easy

Fix input x; forward the same x ten times in train() with p=0.5 and show output variance. Switch to eval() and show outputs are identical across runs. Optionally compare to manual F.dropout(x, p=0.5, training=True).

Optimizers, Learning Rate & Vanishing Gradients

Review: Optimizers Â· Learning rate Â· Vanishing / exploding

Code â€” SGD vs Adam on the same script

Topic: OptimizersLevel: Easy

Train identical model/data for 20 epochs twice: torch.optim.SGD(..., momentum=0.9) vs torch.optim.Adam. Log loss each epoch; print final weightsâ€™ L2 norm for both runs. Use the same seed and dataloader order.

Code â€” StepLR or CosineAnnealingLR

Topic: LR / gradientsLevel: Intermediate

Wrap your optimizer in torch.optim.lr_scheduler.StepLR (or cosine). Print scheduler.get_last_lr() every epoch. Build a 5-layer MLP with Tanh and show vanishing first-layer .grad.norm(); repeat after swapping hidden activations to ReLU.

CNN, RNN, Attention & Transfer Learning

Review: CNN Â· RNN Â· Attention Â· Transfer learning

Code â€” Conv2d output shape in PyTorch

Topic: CNNLevel: Easy

x = torch.randn(8, 3, 32, 32); stack Conv2d(3,64,kernel_size=3,padding=1), ReLU, MaxPool2d(2). Print y.shape after each layer. Repeat with stride=2 conv instead of pool and compare shapes.

Code â€” LSTM + MultiheadAttention toy batch

Topic: RNN / attentionLevel: Intermediate

Tensor (B, T, D) = (4, 10, 32): run through nn.LSTM(D, H, batch_first=True) and print last hidden shape. Same tensor: treat as sequence length 10, use nn.MultiheadAttention(embed_dim=32, num_heads=4, batch_first=True) with self-attn (query=key=value); print output shape.

Code â€” Freeze backbone, train new head

Topic: TransferLevel: Intermediate

Load torchvision.models.resnet18(weights=DEFAULT); replace fc for 5 classes; freeze all parameters except fc. Run one optimizer step and assert only fc.weight.grad is non-None (others None or zero as expected).

Evaluation Metrics & Frameworks

Review: Metrics Â· PyTorch Â· TensorFlow / Keras

Code â€” Accuracy, precision, recall from logits

Topic: MetricsLevel: Easy

Given random logits (N, C) and labels (N,), compute accuracy with argmax. For binary logits (N, 1), compute precision/recall at threshold 0 using vectorized boolean masks (no sklearn required). Compare your numbers to sklearn.metrics if installed.

Code â€” Same MLP in PyTorch vs Keras

Topic: PyTorch / TFLevel: Intermediate

Implement the same 2-hidden-layer classifier in PyTorch (explicit training loop with zero_grad) and in TensorFlow/Keras (model.compile(optimizer='adam', loss='sparse_categorical_crossentropy'), model.fit). Train 3 epochs on identical numpy X, y; print final loss from both (within reason if seeds differ).

Tip: Theory-only review stays on MCQs and interview Q&A; this page is code-only.

Programming: Shapes, Loss APIs & Device

Write a script: x = torch.randn(32, 784); nn.Linear(784, 256)(x) â†’ print shape; chain second Linear to 10 classes. Assert final shape is (32, 10).
Programmatically verify Conv2d output: build layer nn.Conv2d(3, 64, 3, padding=1), input (4, 3, 32, 32), print .shape; compare to formula floor((W+2p-k)/s)+1 for height/width.
Demonstrate in code: CrossEntropyLoss on logits + integer labels vs NLLLoss on log_softmax(dim=1) of the same logitsâ€”losses should match (within float tolerance).
Optional: move the same two-line model to cuda if available; catch and print a clear error if tensors stay on CPU.

Programming: Debug Common Training Bugs

Fix or intentionally reproduce each bug in a minimal script, then correct it.

Forgot zero_grad: run two backward steps without zeroingâ€”watch gradients explode or double; add optimizer.zero_grad(set_to_none=True) and stabilize.
Wrong loss input: pass softmax probabilities into CrossEntropyLoss (should be logits); switch to raw logits and confirm loss decreases.
Eval forgotten: run validation with model.train() and Dropout onâ€”then call model.eval() and torch.no_grad(); compare metric.

Weekly coding rhythm (example)

Notebook / repo habits

Mon:  finish 2 topic-wise code cards + git commit
Wed:  one small dataset + train/val split + log metrics dict
Fri:  refactor into nn.Module + argparse or config dict
Weekend: ablation branch â€” e.g. with/without BatchNorm, same seed

Summary

This page is programming-only: PyTorch/NumPy implementations, training steps, shape checks, and debug drills.
Use the sidebar to open the matching tutorial when an API confuses you; use MCQ / interview pages for non-code review.
Next: real-life examples of neural nets in production.

Close the series with industry use cases on the real-life examples pageâ€”still helpful for interviews after you can ship code.

Previous: Hands-on projects Next: Real-life examples

Related Neural Networks Links