Related Computer Vision Links
Learn Torchvision Computer Vision Tutorial, validate concepts with Torchvision Computer Vision MCQ Questions, and prepare interviews through Torchvision Computer Vision Interview Questions and Answers.
Computer Vision Interview
20 essential Q&A
Updated 2026
torchvision
PyTorch Vision (torchvision): 20 Essential Q&A
Datasets, transforms, and reference models integrated with the PyTorch ecosystem.
~11 min read
20 questions
Intermediate
transforms v2datasetsweightsmodels
Quick Navigation
1
What is torchvision?
⚡ easy
Answer: PyTorch domain library for vision—datasets, transforms, model architectures, and utilities (ops, io).
2
Transforms v2?
📊 medium
Answer: Tensor-based, torchscript-friendly transforms with consistent API for image/video/bbox/mask—prefer over legacy PIL transforms.
3
Compose?
⚡ easy
Answer: Chain transforms in order—typically Resize → ToImage → ToDtype(scale) → Normalize before batching.
4
ImageFolder?
📊 medium
Answer: Folder-per-class dataset returning image, label—pairs with DataLoader for supervised classification finetuning.
5
Common augmentations?
📊 medium
Answer: RandomResizedCrop, hflip, ColorJitter, RandAugment—match train vs eval (no randomness at test).
6
Normalize mean/std?
📊 medium
Answer: Per-channel (x-mean)/std—use weights’ documented stats (ImageNet) when loading pretrained backbones.
7
models.resnet50 pattern?
⚡ easy
Answer: Factory functions return architecture; pass
weights=ResNet50_Weights.IMAGENET1K_V2 for pretrained kernels.
from torchvision import models; m = models.resnet50(weights="DEFAULT")
8
Weights enums?
📊 medium
Answer: Typed enums carry meta (categories, metrics)—
get_weight() or auto-download on first use; reproducible defaults.
9
Finetune classifier?
🔥 hard
Answer: Replace final FC layer to num_classes; freeze backbone optionally; differential LR for head vs body.
10
DataLoader notes?
📊 medium
Answer: num_workers, pin_memory=True on GPU, persistent_workers—collate_fn for variable-size detection batches.
11
Detection helpers?
🔥 hard
Answer: coco_eval, NMS in torchvision.ops—RCNN/Mask R-CNN reference implementations live in torchvision.detection.
12
ONNX export?
📊 medium
Answer: torch.onnx.export on wrapped model—watch dynamic axes and op support; verify in onnxruntime.
13
torchvision vs timm?
📊 medium
Answer: timm: huge model zoo; torchvision: tightly coupled PyTorch references—often mix timm backbone + custom head.
14
AMP?
⚡ easy
Answer: autocast + GradScaler—most torchvision ops support fp16 on CUDA; watch BatchNorm numerics.
15
torchvision.ops?
📊 medium
Answer: ROIAlign, NMS, box_iou—building blocks for detectors; CUDA kernels behind the scenes.
16
Video datasets?
📊 medium
Answer: Kinetics-style readers + temporal transforms—memory heavy; clip sampling strategies matter.
17
Extract features?
🔥 hard
Answer: Forward hooks or intermediate layers API—FPN-style multi-scale features for segmentation/detection heads.
18
torch.jit?
🔥 hard
Answer: Trace or script model+transforms carefully—some dynamic Python in transforms blocks scripting.
19
Version coupling?
⚡ easy
Answer: torchvision releases track specific torch versions—install matched pairs to avoid binary incompatibility.
20
Debug pipeline?
⚡ easy
Answer: Visualize tensors after transforms; assert value ranges [0,1] or normalized; check label mapping in ImageFolder.
torchvision Cheat Sheet
Data
- ImageFolder
- v2 transforms
Models
- weights=...
Train
- DataLoader + AMP
💡 Pro tip: Match preprocessing to pretrained weights; use v2 transforms.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.