Neural Networks 15 Essential Q&A
Interview Prep

Transfer Learning — 15 Interview Questions

Pretrained backbones, linear probes, layer-wise learning rates, and when transfer hurts (negative transfer).

Colored left borders per card; green / amber / red difficulty chips.

Pretrained Freeze Fine-tune Domain
1 Define transfer learning.Easy
Answer: Reuse knowledge from a source task (often large-scale pretraining) to improve a target task with less data, time, or compute than training from scratch.
2 Feature extraction vs fine-tuning.Easy
Answer: Feature extraction: freeze backbone, train only classifier head on your labels. Fine-tuning: update backbone (all or top layers) with usually smaller LR than head.
3 When should you freeze early layers?Medium
Answer: Small dataset or high risk of overfitting—early layers learn generic edges/textures; later layers are more task-specific. Freeze first, then optionally unfreeze with care.
4 Why is ImageNet pretraining common?Easy
Answer: Large diverse natural-image supervision produces general low-level and mid-level filters that transfer well to many vision tasks (medical, satellite with caveats).
5 Discriminative learning rates.Medium
Answer: Use lower η for early layers, higher η for head/new layers—preserves generic features while adapting top of network.
6 Linear probe vs fine-tune (evaluation).Medium
Answer: Linear probe: train only linear layer on frozen features—measures representation quality. Full fine-tune measures end-task performance with adaptation.
7 Domain shift / domain gap.Medium
Answer: Source and target data differ in distribution—transfer may degrade; may need more fine-tuning, domain adaptation, or pretraining closer to target domain.
8 What is negative transfer?Hard
Answer: Pretrained model does worse than random init for target—source task misaligned or harmful biases; sometimes train-from-scratch wins on huge target data.
9 Transfer in NLP (BERT-style).Easy
Answer: Pretrain on large text with MLM/next-sentence; fine-tune whole model on downstream task with task-specific head—dominant paradigm before instruction-tuning era.
10 Replace only the classification head—when enough?Easy
Answer: When new task matches pretraining modality and semantics (e.g. 1000-way → N-way objects) and data is limited—fast and strong baseline.
11 Data augmentation with fine-tuning.Easy
Answer: Still critical—reduces overfitting on small target sets; match augmentations to target domain (e.g. no extreme color jitter on medical).
12 Catastrophic forgetting (mention in transfer context).Hard
Answer: Fine-tuning can erase source capabilities—relevant in continual learning; mitigated by small LR, elastic weight consolidation, or multi-task training.
13 Self-supervised pretraining then fine-tune.Medium
Answer: Pretrain without labels (contrastive, MAE, etc.), then supervised fine-tune—scales when labeled data is scarce but unlabeled data is plentiful.
14 More target data → train from scratch?Medium
Answer: With very large in-domain labeled data, gap shrinks; still often faster to start from pretrained weights. Empirical per task.
15 Interview one-liner: why use transfer learning?Easy
Answer: Better accuracy faster with less labeled data by reusing generic features learned at scale—standard for vision and NLP.
Mention freeze → unfreeze and smaller LR on backbone.

Quick review checklist

  • Feature extraction vs fine-tuning; when to freeze.
  • Discriminative LR; domain shift; negative transfer.
  • Linear probe; NLP BERT pattern; self-supervised pretrain.