Computer Vision Interview 20 essential Q&A Updated 2026
ImageNet

ImageNet: 20 Essential Q&A

The dataset that pretrained the modern CV era—structure, tasks, and caveats for transfer learning.

~10 min read 20 questions Intermediate
ILSVRCsynset1k classespretrain
1 What is ImageNet? ⚡ easy
Answer: Large-scale image dataset organized by WordNet synsets—millions of labeled images driving classification pretraining.
2 ILSVRC? 📊 medium
Answer: Annual challenge subset (~1.2M train, 50k val, 1000 classes) used historically for ImageNet-1K classification benchmarks.
3 What is a synset? 📊 medium
Answer: WordNet sense (e.g. specific dog breed)—each class is a disambiguated noun phrase to reduce polysemy.
4 Scale? ⚡ easy
Answer: Roughly 1.28M training images for 1K ILSVRC classes—enough diversity to learn general visual features.
5 Why report top-5 error? 📊 medium
Answer: Fine-grained classes make single exact label harsh—top-5 was standard headline metric during AlexNet era.
6 Val vs test? 📊 medium
Answer: Public val for development; test held out for leaderboard—reproducible papers compare on val with fixed split.
7 Canonical preprocessing? 🔥 hard
Answer: Short side resize 256, center crop 224, mean/std normalization—must match weights (different for Inception vs ResNet sometimes).
# ImageNet norm: mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]
8 Transfer learning role? 📊 medium
Answer: Backbone trained on ImageNet features edges/textures/objects—finetune on small domain datasets with smaller LR.
9 Freeze backbone? 📊 medium
Answer: Early training only head when data tiny; unfreeze later—BN layers need care (batch stats) in finetune.
10 Noise / bias? 🔥 hard
Answer: Crowdsourced labels contain errors; geographic and demographic skew—ImageNet audit projects documented issues.
11 Hierarchical labels? 📊 medium
Answer: WordNet tree enables hierarchical metrics and zero-shot transfer—not all models exploit hierarchy in loss.
12 Classic augmentations? ⚡ easy
Answer: RandomResizedCrop, flip, color jitter—standard on ImageNet training recipes (RRC is critical for ResNet).
13 Tiny ImageNet? ⚡ easy
Answer: Teaching subset (200 classes, 64×64)—useful for coursework; not same distribution as full IN.
14 Relation to Open Images? 📊 medium
Answer: Different project (multi-label, boxes)—don’t confuse with ImageNet-1K single-label classification.
15 Licensing? ⚡ easy
Answer: Images scraped from web with varying rights—research use common; commercial redeployment needs legal review.
16 ObjectNet lesson? 🔥 hard
Answer: Controls viewpoint/background—shows ImageNet-trained models rely on spurious cues; stresses robust evaluation.
17 Beyond single-label? 📊 medium
Answer: Web-scale image-text (CLIP) reduces reliance on pure ImageNet classification for pretraining—still often finetuned with IN-like data.
18 EfficientNet story? 📊 medium
Answer: Compound scaling depth/width/resolution on ImageNet—Pareto frontier influenced mobile deployment targets.
19 ViT on ImageNet? 🔥 hard
Answer: Transformers need large data or strong augmentation + pretrain—ImageNet-1K alone smaller than JFT; hybrids mattered early.
20 Still pretrain on IN? ⚡ easy
Answer: Common baseline though larger multimodal corpora grow—ImageNet remains reference for architecture comparisons.

ImageNet Cheat Sheet

Core
  • 1K / synsets
Metric
  • top-1 / top-5
Use
  • Pretrain backbone

💡 Pro tip: Match resize/crop/normalization to weight recipe.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.