ImageNet: CV guide

WordNet and synsets

Each class corresponds to a synset ID (e.g. n01440764 “tench”). Hierarchical relations in WordNet are not always reflected in flat 1-of-K training—hierarchical loss is optional research direction.

ILSVRC tasks

Beyond single-label classification, historical challenges included localization (bbox) and detection. Today, COCO is more common for detection benchmarks, while ImageNet-pretrained backbones remain the default initialization.

Using label names in code

from torchvision.models import resnet50, ResNet50_Weights

w = ResNet50_Weights.IMAGENET1K_V2
names = w.meta["categories"]  # list of 1000 strings
# idx = logits.argmax(dim=1).item()
# print(names[idx])

Different weight versions may share the same 1k label order—confirm in the weights metadata you load.

Transfer learning

Replace the classifier head, freeze early layers optionally, train on your domain. Features are biased toward ImageNet objects; medical or industrial imagery may need more adaptation or different pretraining (SimCLR, CLIP, domain-specific data).

                    Takeaways
                    ImageNet scale enabled modern CNNs; data governance and consent practices have evolved since early collection.
Top-1 / top-5 error reported on fixed val split—do not compare to your test set blindly.
Consider newer pretraining (web-scale contrastive) when label semantics differ.

                

Quick FAQ

A larger superset with thousands of classes is used by some models (ViT, EfficientNet variants); preprocessing and head shapes differ from 1k checkpoints.

A common teaching subset (64×64, 200 classes)—not the official full ImageNet benchmark.

Related Computer Vision Links

WordNet and synsets

ILSVRC tasks

Using label names in code

Transfer learning

Takeaways

Quick FAQ

ImageNet-21k?

Tiny ImageNet?