Transfer Learning â€” 15 Interview Questions

Pretrained backbones, linear probes, layer-wise learning rates, and when transfer hurts (negative transfer).

Colored left borders per card; green / amber / red difficulty chips.

Pretrained Freeze Fine-tune Domain

1 Define transfer learning.Easy

Answer: Reuse knowledge from a source task (often large-scale pretraining) to improve a target task with less data, time, or compute than training from scratch.

2 Feature extraction vs fine-tuning.Easy

Answer: Feature extraction: freeze backbone, train only classifier head on your labels. Fine-tuning: update backbone (all or top layers) with usually smaller LR than head.

3 When should you freeze early layers?Medium

Answer: Small dataset or high risk of overfittingâ€”early layers learn generic edges/textures; later layers are more task-specific. Freeze first, then optionally unfreeze with care.

4 Why is ImageNet pretraining common?Easy

Answer: Large diverse natural-image supervision produces general low-level and mid-level filters that transfer well to many vision tasks (medical, satellite with caveats).

5 Discriminative learning rates.Medium

Answer: Use lower Î· for early layers, higher Î· for head/new layersâ€”preserves generic features while adapting top of network.

6 Linear probe vs fine-tune (evaluation).Medium

Answer: Linear probe: train only linear layer on frozen featuresâ€”measures representation quality. Full fine-tune measures end-task performance with adaptation.

7 Domain shift / domain gap.Medium

Answer: Source and target data differ in distributionâ€”transfer may degrade; may need more fine-tuning, domain adaptation, or pretraining closer to target domain.

8 What is negative transfer?Hard

Answer: Pretrained model does worse than random init for targetâ€”source task misaligned or harmful biases; sometimes train-from-scratch wins on huge target data.

9 Transfer in NLP (BERT-style).Easy

Answer: Pretrain on large text with MLM/next-sentence; fine-tune whole model on downstream task with task-specific headâ€”dominant paradigm before instruction-tuning era.

10 Replace only the classification headâ€”when enough?Easy

Answer: When new task matches pretraining modality and semantics (e.g. 1000-way â†’ N-way objects) and data is limitedâ€”fast and strong baseline.

11 Data augmentation with fine-tuning.Easy

Answer: Still criticalâ€”reduces overfitting on small target sets; match augmentations to target domain (e.g. no extreme color jitter on medical).

12 Catastrophic forgetting (mention in transfer context).Hard

Answer: Fine-tuning can erase source capabilitiesâ€”relevant in continual learning; mitigated by small LR, elastic weight consolidation, or multi-task training.

13 Self-supervised pretraining then fine-tune.Medium

Answer: Pretrain without labels (contrastive, MAE, etc.), then supervised fine-tuneâ€”scales when labeled data is scarce but unlabeled data is plentiful.

14 More target data â†’ train from scratch?Medium

Answer: With very large in-domain labeled data, gap shrinks; still often faster to start from pretrained weights. Empirical per task.

15 Interview one-liner: why use transfer learning?Easy

Answer: Better accuracy faster with less labeled data by reusing generic features learned at scaleâ€”standard for vision and NLP.

Mention freeze â†’ unfreeze and smaller LR on backbone.

Quick review checklist

Feature extraction vs fine-tuning; when to freeze.
Discriminative LR; domain shift; negative transfer.
Linear probe; NLP BERT pattern; self-supervised pretrain.

Previous: Attention Next: Evaluation metrics

Related Neural Networks Links

Transfer Learning â€” 15 Interview Questions

Quick review checklist