PyTorch: 20 Interview Questions & Answers

Question 1

1 What is PyTorch? âš¡ Easy

Answer

Answer: PyTorch is an openâ€‘source deep learning framework based on Torch, with dynamic computation graphs (defineâ€‘byâ€‘run). It provides tensor computation with GPU acceleration, automatic differentiation via autograd, and a modular ecosystem (torch.nn, torch.optim, torch.utils.data).

Question 2

2 PyTorch Tensor vs NumPy array â€“ differences? ðŸ“Š Medium

Answer

Answer: Both share similar APIs, but PyTorch tensors run on GPUs, support automatic differentiation, and integrate with deep learning ops. NumPy is CPUâ€‘only. Convert via .numpy() and torch.from_numpy() (shared memory).

Question 3

3 How does autograd work? ðŸ”¥ Hard

Answer

Answer: autograd records operations on tensors with requires_grad=True to build a dynamic computation graph. During backward pass, it traverses the graph in reverse to compute gradients using the chain rule. Gradients accumulate in the .grad attribute.

Question 4

4 What is nn.Module? ðŸ“Š Medium

Answer

Answer: Base class for all neural network layers/models. It encapsulates parameters (nn.Parameter), forward method, and submodules. Provides .to(device), .train(), .eval(), and builtâ€‘in parameter tracking.

Question 5

5 nn.Module vs nn.functional? ðŸ”¥ Hard

Answer

Answer: nn.Module contains state (parameters) and is the recommended way to build layers. nn.functional provides stateless functions (e.g., F.relu, F.cross_entropy). Usually you use nn.Linear (module) but call F.relu inside forward.

Question 6

6 Explain torch.utils.data.Dataset and DataLoader. ðŸ“Š Medium

Answer

Answer: Dataset stores samples and their labels (implement __len__ and __getitem__). DataLoader wraps a Dataset, provides batching, shuffling, and multiâ€‘thread loading (num_workers).

Question 7

7 How do you use GPU in PyTorch? âš¡ Easy

Answer

Answer: device = torch.device('cuda' if torch.cuda.is_available() else 'cpu'). Then model.to(device) and tensor.to(device). Always assign to variables.

Question 8

8 How does loss.backward() work? ðŸ“Š Medium

Answer

Answer: Computes gradients of the loss w.r.t all tensors with requires_grad=True using backpropagation. Gradients are accumulated, so you need optimizer.zero_grad() before each step.

Question 9

9 What does optimizer.step() do? âš¡ Easy

Answer

Answer: Updates parameters using the gradients stored in .grad fields, according to the optimization algorithm (SGD, Adam, etc.). Called after .backward().

Question 10

10 How to create a custom autograd Function? ðŸ”¥ Hard

Answer

Answer: Subclass torch.autograd.Function and implement static forward(ctx, ...) and backward(ctx, grad_output). Use ctx.save_for_backward to cache tensors.

Question 11

11 How do you implement distributed training? ðŸ”¥ Hard

Answer

Answer: Use torch.nn.DataParallel (singleâ€‘node multiâ€‘GPU) or torch.nn.parallel.DistributedDataParallel (multiâ€‘node). DDP is faster, spawns one process per GPU, handles sync via torch.distributed.

Question 12

12 What is mixed precision training (AMP)? ðŸ“Š Medium

Answer

Answer: Uses both float16 and float32 to speed up training and reduce memory. PyTorch provides torch.cuda.amp: autocast() for forward and GradScaler to prevent underflow.

Question 13

13 When to use torch.no_grad()? âš¡ Easy

Answer

Answer: Disables gradient tracking, useful for inference and validation to save memory/computation. Also used when you need to modify tensors inâ€‘place without affecting autograd.

Question 14

14 How to save/load a PyTorch model? ðŸ“Š Medium

Answer

Answer: Save: torch.save(model.state_dict(), 'path.pth'). Load: model.load_state_dict(torch.load('path.pth')). For full model: torch.save(model, ...) (less flexible).

Question 15

15 Explain TorchScript. ðŸ”¥ Hard

Answer

Answer: A way to serialize PyTorch models for production (C++ runtime). Two methods: tracing (torch.jit.trace) and scripting (torch.jit.script). Script handles control flow.

Question 16

16 Dynamic computation graph advantage? ðŸ“Š Medium

Answer

Answer: Graph is built onâ€‘theâ€‘fly per forward pass. Easier debugging, variable length inputs, Pythonic control flow. PyTorch uses dynamic; TensorFlow 1.x used static. TF2 adopted eager execution.

Question 17

17 How to do transfer learning? ðŸ“Š Medium

Answer

Answer: Load a pretrained model (torchvision.models), freeze layers (param.requires_grad = False), replace the classifier head, and train only the new layers (or fineâ€‘tune later).

Question 18

18 What is gradient accumulation? ðŸ”¥ Hard

Answer

Answer: Accumulates gradients over several miniâ€‘batches before updating weights. Simulates larger batch size when memory is limited. Do loss.backward() every microâ€‘batch, and step/zero_grad every N steps.

Question 19

19 How to use TensorBoard with PyTorch? ðŸ“Š Medium

Answer

Answer: Use torch.utils.tensorboard.SummaryWriter. Log scalars, histograms, images, and model graph. Alternatively, thirdâ€‘party like wandb.

Question 20

20 Common PyTorch pitfalls? ðŸ”¥ Hard

Answer

Answer: Forgetting optimizer.zero_grad(), not calling .to(device) for all tensors, inâ€‘place ops after gradient computation, mixing numpy and torch without care, detaching graph incorrectly.

Related Deep Learning Links

PyTorch: 20 Interview Questions & Answers

PyTorch Interview Cheat Sheet

Core

Data

Production