Computer Vision Chapter 47

ImageNet

ImageNet is a large-scale image dataset organized by WordNet synsets (noun concepts). The ILSVRC classification challenge popularized a 1,000-class subset; winning architectures (AlexNet, VGG, ResNet, …) transferred features across vision tasks. Full ImageNet licensing and download require registration; frameworks ship pretrained weights trained on those labels so you rarely need the raw corpus for inference. Be mindful of bias and label noise inherited from web data and crowdsourcing.

WordNet and synsets

Each class corresponds to a synset ID (e.g. n01440764 “tench”). Hierarchical relations in WordNet are not always reflected in flat 1-of-K training—hierarchical loss is optional research direction.

ILSVRC tasks

Beyond single-label classification, historical challenges included localization (bbox) and detection. Today, COCO is more common for detection benchmarks, while ImageNet-pretrained backbones remain the default initialization.

Using label names in code

from torchvision.models import resnet50, ResNet50_Weights

w = ResNet50_Weights.IMAGENET1K_V2
names = w.meta["categories"]  # list of 1000 strings
# idx = logits.argmax(dim=1).item()
# print(names[idx])

Different weight versions may share the same 1k label order—confirm in the weights metadata you load.

Transfer learning

Replace the classifier head, freeze early layers optionally, train on your domain. Features are biased toward ImageNet objects; medical or industrial imagery may need more adaptation or different pretraining (SimCLR, CLIP, domain-specific data).

Takeaways

  • ImageNet scale enabled modern CNNs; data governance and consent practices have evolved since early collection.
  • Top-1 / top-5 error reported on fixed val split—do not compare to your test set blindly.
  • Consider newer pretraining (web-scale contrastive) when label semantics differ.

Quick FAQ

A larger superset with thousands of classes is used by some models (ViT, EfficientNet variants); preprocessing and head shapes differ from 1k checkpoints.

A common teaching subset (64×64, 200 classes)—not the official full ImageNet benchmark.