Computer Vision Interview 20 essential Q&A Updated 2026
torchvision

PyTorch Vision (torchvision): 20 Essential Q&A

Datasets, transforms, and reference models integrated with the PyTorch ecosystem.

~11 min read 20 questions Intermediate
transforms v2datasetsweightsmodels
1 What is torchvision? ⚡ easy
Answer: PyTorch domain library for vision—datasets, transforms, model architectures, and utilities (ops, io).
2 Transforms v2? 📊 medium
Answer: Tensor-based, torchscript-friendly transforms with consistent API for image/video/bbox/mask—prefer over legacy PIL transforms.
3 Compose? ⚡ easy
Answer: Chain transforms in order—typically Resize → ToImage → ToDtype(scale) → Normalize before batching.
4 ImageFolder? 📊 medium
Answer: Folder-per-class dataset returning image, label—pairs with DataLoader for supervised classification finetuning.
5 Common augmentations? 📊 medium
Answer: RandomResizedCrop, hflip, ColorJitter, RandAugment—match train vs eval (no randomness at test).
6 Normalize mean/std? 📊 medium
Answer: Per-channel (x-mean)/std—use weights’ documented stats (ImageNet) when loading pretrained backbones.
7 models.resnet50 pattern? ⚡ easy
Answer: Factory functions return architecture; pass weights=ResNet50_Weights.IMAGENET1K_V2 for pretrained kernels.
from torchvision import models; m = models.resnet50(weights="DEFAULT")
8 Weights enums? 📊 medium
Answer: Typed enums carry meta (categories, metrics)—get_weight() or auto-download on first use; reproducible defaults.
9 Finetune classifier? 🔥 hard
Answer: Replace final FC layer to num_classes; freeze backbone optionally; differential LR for head vs body.
10 DataLoader notes? 📊 medium
Answer: num_workers, pin_memory=True on GPU, persistent_workers—collate_fn for variable-size detection batches.
11 Detection helpers? 🔥 hard
Answer: coco_eval, NMS in torchvision.ops—RCNN/Mask R-CNN reference implementations live in torchvision.detection.
12 ONNX export? 📊 medium
Answer: torch.onnx.export on wrapped model—watch dynamic axes and op support; verify in onnxruntime.
13 torchvision vs timm? 📊 medium
Answer: timm: huge model zoo; torchvision: tightly coupled PyTorch references—often mix timm backbone + custom head.
14 AMP? ⚡ easy
Answer: autocast + GradScaler—most torchvision ops support fp16 on CUDA; watch BatchNorm numerics.
15 torchvision.ops? 📊 medium
Answer: ROIAlign, NMS, box_iou—building blocks for detectors; CUDA kernels behind the scenes.
16 Video datasets? 📊 medium
Answer: Kinetics-style readers + temporal transforms—memory heavy; clip sampling strategies matter.
17 Extract features? 🔥 hard
Answer: Forward hooks or intermediate layers API—FPN-style multi-scale features for segmentation/detection heads.
18 torch.jit? 🔥 hard
Answer: Trace or script model+transforms carefully—some dynamic Python in transforms blocks scripting.
19 Version coupling? ⚡ easy
Answer: torchvision releases track specific torch versions—install matched pairs to avoid binary incompatibility.
20 Debug pipeline? ⚡ easy
Answer: Visualize tensors after transforms; assert value ranges [0,1] or normalized; check label mapping in ImageFolder.

torchvision Cheat Sheet

Data
  • ImageFolder
  • v2 transforms
Models
  • weights=...
Train
  • DataLoader + AMP

💡 Pro tip: Match preprocessing to pretrained weights; use v2 transforms.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.