AlexNet: 20 Essential Q&A

The architecture that popularized deep CNNs on ImageNet—ReLU, dropout, and GPU scale.

~10 min read 20 questions Advanced

ImageNetReLUdropoutLRN

Quick Navigation

1 Why is AlexNet important? ⚡ easy

Answer: Won ImageNet 2012 by a large margin—showed deep CNNs + GPU + data could beat hand-crafted features, sparking the deep learning boom in vision.

2 What was ImageNet 2012? 📊 medium

Answer: 1.2M images, 1000 classes—AlexNet ~16% top-5 error vs previous ~26% with shallow methods—breakthrough result.

3 Rough architecture? 📊 medium

Answer: Five conv layers (some grouped across 2 GPUs) + max pooling + three large FC layers + softmax—deeper than prior CNNs for this task.

4 Why ReLU? 📊 medium

Answer: Faster training than saturating tanh/sigmoid; mitigates vanishing gradient in deep stacks; sparse activations.

5 Use of dropout? 📊 medium

Answer: Regularize huge FC layers by randomly zeroing neurons—reduces co-adaptation on training set.

6 What was LRN? 🔥 hard

Answer: Local response normalization—side inhibition across channels; later often replaced by batch norm; minor effect in hindsight.

7 Overlapping pooling? 📊 medium

Answer: Stride smaller than pool window—slightly richer downsampling vs non-overlapping; less common in newer nets.

8 Two GPUs? ⚡ easy

Answer: Model split across GPUs due to memory limits—cross-GPU connections only on certain layers (engineering constraint of the time).

9 Augmentation? 📊 medium

Answer: Random crops/flips from 256×256, PCA color jitter—reduces overfitting and increases effective data.

10 Parameters? ⚡ easy

Answer: On order of 60M—mostly FC layers; later architectures reduce FC params with GAP.

11 Training details? 📊 medium

Answer: SGD + momentum, weight decay, learning rate schedule dropping on plateaus—long schedule on two GPUs.

12 Overfitting risk? 📊 medium

Answer: Large capacity vs data—addressed by dropout, aug, and weight decay; still a concern for smaller datasets when fine-tuning.

13 vs VGG? 📊 medium

Answer: VGG uses uniform 3×3 stacks, deeper, more systematic—higher accuracy, more compute; AlexNet shallower irregular design.

14 vs ResNet? 📊 medium

Answer: ResNet adds residuals enabling much deeper nets—AlexNet depth modest by today’s standards.

15 Use AlexNet now? ⚡ easy

Answer: Mostly for teaching/history; ResNet/EfficientNet backbones dominate transfer learning—AlexNet too weak/slow vs modern alternatives.

16 Typical input? 📊 medium

Answer: 224×224 crops from 256×256 resized image—standard pipeline referenced in many papers.

17 Output layer? ⚡ easy

Answer: 1000-way softmax for ImageNet classes—cross-entropy loss during training.

18 Obsolete? ⚡ easy

Answer: For production accuracy, yes; for pedagogy and history, still the canonical “first big win” story.

19 Impact beyond vision? ⚡ easy

Answer: Validated deep learning at scale—influenced speech, NLP later wave; proved GPUs + data + depth recipe.

20 Modern small nets? 📊 medium

Answer: MobileNet, EfficientNet achieve better accuracy/FLOPs—mobile edge rarely uses AlexNet-sized FC heads.

AlexNet Cheat Sheet

Breakthrough

ImageNet 2012

Ideas

ReLU
Dropout

Today

Historical
Superseded

💡 Pro tip: Name ImageNet 2012 + ReLU + dropout + GPUs.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.

AlexNet Tutorial

Previous Next

Related Computer Vision Links