Related Computer Vision Links
Learn Instance Computer Vision Tutorial, validate concepts with Instance Computer Vision MCQ Questions, and prepare interviews through Instance Computer Vision Interview Questions and Answers.
Computer Vision Interview
20 essential Q&A
Updated 2026
instance seg
Instance Segmentation: 20 Essential Q&A
Separate masks per object instance—Mask R-CNN and the overlap problem.
~12 min read
20 questions
Advanced
Mask R-CNNROIAlignmask APFCOS
Quick Navigation
1
What is instance segmentation?
⚡ easy
Answer: Each object instance gets its own binary mask and class label—even two “person” pixels belong to different instances if on different people.
2
Semantic vs instance?
📊 medium
Answer: Semantic: one mask per class. Instance: N masks for N objects, possibly same class—handles overlap with distinct IDs.
3
How does Mask R-CNN extend Faster R-CNN?
📊 medium
Answer: Adds parallel mask head: small FCN on each RoI predicts K×K binary mask per class—multi-task with box + class.
4
Why RoIAlign?
🔥 hard
Answer: RoIPool quantizes coordinates → misalignment for masks. RoIAlign uses bilinear sampling at exact float locations—critical for pixel-accurate masks.
5
Mask branch output?
📊 medium
Answer: Typically 28×28 logits upsampled to RoI size with threshold—lightweight per-region FCN.
6
Loss on masks?
📊 medium
Answer: Per-pixel sigmoid + BCE on the target class mask only (not softmax over all classes per pixel in the classic formulation).
7
Can two instance masks overlap in GT?
⚡ easy
Answer: Yes—foreground object in front of another; model must predict ordering or independent masks per instance.
8
Panoptic segmentation?
📊 medium
Answer: Unifies semantic “stuff” and instance “things” with non-overlapping full-scene labeling—each pixel has one label + optional instance id.
9
What is YOLACT?
📊 medium
Answer: One-stage: combines prototype masks with per-instance coefficients for fast instance segmentation—speed-quality tradeoff.
10
SOLO / SOLOv2 idea?
🔥 hard
Answer: Define instance by grid location and scale—predict category and mask for each grid cell without anchors in the traditional sense.
11
DETR for masks?
🔥 hard
Answer: Set prediction with mask head or panoptic head—queries attend to image features to produce instance masks end-to-end.
12
What is mask AP?
📊 medium
Answer: AP computed on mask IoU instead of box IoU—COCO primary metric for instance segmentation quality.
13
Polygon vs raster?
⚡ easy
Answer: Datasets may store COCO RLE or polygons; training often rasterizes to fixed resolution masks for loss.
14
COCO stuff vs things?
📊 medium
Answer: Things are countable instances; stuff is amorphous (grass, sky)—panoptic benchmark merges both.
15
Small instances?
📊 medium
Answer: High-res FPN levels, copy-paste augmentation, and specialized heads help—same challenges as object detection.
16
Why slower than detection?
⚡ easy
Answer: Extra per-RoI mask computation and higher memory—one-stage mask methods aim to close the gap.
17
Role of FPN?
📊 medium
Answer: Multi-scale object proposals and features so small and large instances both get good mask features.
18
HTC / Cascade?
🔥 hard
Answer: Iteratively refine boxes and masks with cascaded stages and inter-task fusion—state-of-art on COCO era leaderboards.
19
Refine boundaries?
🔥 hard
Answer: Methods like PointRend adaptively sample points on uncertain boundaries for fine mask prediction—better edges.
20
Annotation?
⚡ easy
Answer: Instance masks are most expensive—interactive tools, synthetic data, and weak supervision are active research areas.
Instance Segmentation Cheat Sheet
Key model
- Mask R-CNN
- RoIAlign
Metric
- Mask AP
Fast
- YOLACT
- Query-based
💡 Pro tip: RoIAlign fixes half-pixel misalignment that hurts masks.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.