ImageNet: 20 Essential Q&A

Question 1

1 What is ImageNet? ⚡ easy

Answer

Answer: Large-scale image dataset organized by WordNet synsets—millions of labeled images driving classification pretraining.

Question 2

2 ILSVRC? 📊 medium

Answer

Answer: Annual challenge subset (~1.2M train, 50k val, 1000 classes) used historically for ImageNet-1K classification benchmarks.

Question 3

3 What is a synset? 📊 medium

Answer

Answer: WordNet sense (e.g. specific dog breed)—each class is a disambiguated noun phrase to reduce polysemy.

Question 4

4 Scale? ⚡ easy

Answer

Answer: Roughly 1.28M training images for 1K ILSVRC classes—enough diversity to learn general visual features.

Question 5

5 Why report top-5 error? 📊 medium

Answer

Answer: Fine-grained classes make single exact label harsh—top-5 was standard headline metric during AlexNet era.

Question 6

6 Val vs test? 📊 medium

Answer

Answer: Public val for development; test held out for leaderboard—reproducible papers compare on val with fixed split.

Question 7

7 Canonical preprocessing? 🔥 hard

Answer

Answer: Short side resize 256, center crop 224, mean/std normalization—must match weights (different for Inception vs ResNet sometimes).

Question 8

8 Transfer learning role? 📊 medium

Answer

Answer: Backbone trained on ImageNet features edges/textures/objects—finetune on small domain datasets with smaller LR.

Question 9

9 Freeze backbone? 📊 medium

Answer

Answer: Early training only head when data tiny; unfreeze later—BN layers need care (batch stats) in finetune.

Question 10

10 Noise / bias? 🔥 hard

Answer

Answer: Crowdsourced labels contain errors; geographic and demographic skew—ImageNet audit projects documented issues.

Question 11

11 Hierarchical labels? 📊 medium

Answer

Answer: WordNet tree enables hierarchical metrics and zero-shot transfer—not all models exploit hierarchy in loss.

Question 12

12 Classic augmentations? ⚡ easy

Answer

Answer: RandomResizedCrop, flip, color jitter—standard on ImageNet training recipes (RRC is critical for ResNet).

Question 13

13 Tiny ImageNet? ⚡ easy

Answer

Answer: Teaching subset (200 classes, 64×64)—useful for coursework; not same distribution as full IN.

Question 14

14 Relation to Open Images? 📊 medium

Answer

Answer: Different project (multi-label, boxes)—don’t confuse with ImageNet-1K single-label classification.

Question 15

15 Licensing? ⚡ easy

Answer

Answer: Images scraped from web with varying rights—research use common; commercial redeployment needs legal review.

Question 16

16 ObjectNet lesson? 🔥 hard

Answer

Answer: Controls viewpoint/background—shows ImageNet-trained models rely on spurious cues; stresses robust evaluation.

Question 17

17 Beyond single-label? 📊 medium

Answer

Answer: Web-scale image-text (CLIP) reduces reliance on pure ImageNet classification for pretraining—still often finetuned with IN-like data.

Question 18

18 EfficientNet story? 📊 medium

Answer

Answer: Compound scaling depth/width/resolution on ImageNet—Pareto frontier influenced mobile deployment targets.

Question 19

19 ViT on ImageNet? 🔥 hard

Answer

Answer: Transformers need large data or strong augmentation + pretrain—ImageNet-1K alone smaller than JFT; hybrids mattered early.

Question 20

20 Still pretrain on IN? ⚡ easy

Answer

Answer: Common baseline though larger multimodal corpora grow—ImageNet remains reference for architecture comparisons.

Related Computer Vision Links