Related Computer Vision Links
Learn Semantic Computer Vision Tutorial, validate concepts with Semantic Computer Vision MCQ Questions, and prepare interviews through Semantic Computer Vision Interview Questions and Answers.
Computer Vision Interview
20 essential Q&A
Updated 2026
semantic seg
Semantic Segmentation: 20 Essential Q&A
Pixel-wise class labels, encoder–decoder designs, and how we score dense prediction.
~12 min read
20 questions
Advanced
FCNU-NetmIoUdice
Quick Navigation
1. What is semantic segmentation?
2. vs image classification
3. Fully convolutional (FCN)
4. U-Net & skip connections
5. Upsampling / decoder
6. mIoU metric
7. Dice / F1 on masks
2. vs image classification
3. Fully convolutional (FCN)
4. U-Net & skip connections
5. Upsampling / decoder
6. mIoU metric
7. Dice / F1 on masks
1
What is semantic segmentation?
⚡ easy
Answer: Assigning a class label to every pixel (road, sky, person)—no distinction between different instances of the same class.
2
How does it differ from classification?
⚡ easy
Answer: Classification: one label per image. Semantic segmentation: dense spatial map of labels—requires localization and context.
3
What did FCN change?
📊 medium
Answer: Replaced fully connected layers with 1×1 convolutions so arbitrary input sizes work; learnable upsampling (deconv/transposed conv) to recover resolution.
4
Why U-Net skips?
📊 medium
Answer: Encoder downsamples for context; decoder upsamples; skip connections fuse fine detail from shallow layers with semantic deep features—sharp boundaries.
5
Common upsampling methods?
📊 medium
Answer: Transposed convolution, bilinear upsample + conv, sub-pixel shuffle—each trades artifacts, parameters, and speed differently.
6
What is mIoU?
📊 medium
Answer: Mean Intersection over Union per class (then averaged): measures overlap of predicted vs ground-truth masks—standard benchmark metric.
7
What is Dice coefficient?
📊 medium
Answer: 2|A∩B|/(|A|+|B|)—closely related to F1 for binary masks; common loss for medical segmentation when foreground is tiny.
8
Standard loss?
⚡ easy
Answer: Per-pixel cross-entropy (softmax over classes); can weight rare classes or use focal variants for hard pixels.
9
Why are boundaries hard?
🔥 hard
Answer: Ambiguous edges, thin structures disappear at low res—fixes: deep supervision, boundary-aware loss, high-res branches, or larger input crops.
10
Handle class imbalance?
📊 medium
Answer: Weighted CE, oversampling rare classes, focal loss, dice loss, or balanced sampling in batches.
11
What is ASPP?
🔥 hard
Answer: Atrous spatial pyramid pooling—parallel dilated convs at multiple rates capture multi-scale context without losing resolution (DeepLab family).
12
What is PSPNet idea?
📊 medium
Answer: Pyramid pooling at several scales then upsample and concatenate—rich global scene context for each pixel.
13
Multi-scale inference?
📊 medium
Answer: Run network on several scales / flipped inputs and average logits—boosts mIoU at inference cost.
14
Weakly supervised segmentation?
🔥 hard
Answer: Train from image tags, scribbles, or bounding boxes using constraints (e.g. MIL, GrabCut-style seeds)—less pixel labels needed.
15
Link to panoptic?
📊 medium
Answer: Panoptic adds instance IDs for “things” while semantic handles “stuff”—semantic is a component of full scene parsing.
16
Use CRF post-processing?
📊 medium
Answer: Historically refined CNN outputs with pairwise smoothness; less dominant now with stronger architectures but still taught in interviews.
17
Can semantic separate two people?
⚡ easy
Answer: No—both get label “person”; need instance segmentation for separate masks.
18
Why is data expensive?
⚡ easy
Answer: Pixel-accurate masks per image vs bounding boxes—tools like semi-auto labeling and synthetic data help.
19
Transformers for segmentation?
🔥 hard
Answer: SegFormer, Mask2Former, Segmenter—global attention and mask queries compete with CNN encoders on benchmarks.
20
Real-time models?
📊 medium
Answer: Lightweight backbones (MobileNet), BiSeNet, Fast-SCNN—trade mIoU for FPS on edge devices.
Semantic Segmentation Cheat Sheet
Architecture
- Encoder–decoder
- Skips (U-Net)
Metric
- mIoU
- Dice (medical)
Context
- ASPP / PSP
- Multi-scale test
💡 Pro tip: Dense per-pixel labels; same class shares one semantic mask.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.