Computer Vision Interview 20 essential Q&A Updated 2026
COCO

MS COCO: 20 Essential Q&A

Detection, segmentation, captions, and person keypoints—benchmark details interviewers expect.

~11 min read 20 questions Intermediate
instancesstuffcaptionsOKS
1 What is MS COCO? ⚡ easy
Answer: Common Objects in Context—benchmark for detection, segmentation, captions, and keypoints with rich scene images.
2 Which tasks? 📊 medium
Answer: Object detection (bbox), instance seg, panoptic (thing+stuff), image captioning, person keypoints—each has metrics.
3 JSON annotations? 📊 medium
Answer: COCO format lists images, categories, annotations with bbox [x,y,w,h], segmentation polygons/RLE, area, iscrowd flag.
4 Bbox format? ⚡ easy
Answer: Top-left x,y plus width,height in pixels—convert carefully vs xyxy conventions in codebases.
5 Instance masks? 📊 medium
Answer: Often stored as RLE compression per object—Mask R-CNN training decodes to binary masks per instance.
6 Panoptic on COCO? 🔥 hard
Answer: Unifies semantic “stuff” and instance “things” with PQ metric—requires non-overlapping label assignment per pixel.
7 Captions? 📊 medium
Answer: Multiple human captions per image—evaluation with BLEU/CIDEr/SPICE; encourages descriptive models.
8 Keypoints? 📊 medium
Answer: 17 body joints for person instances—AP computed with OKS instead of IoU for matching.
9 Detection mAP on COCO? 🔥 hard
Answer: Primary AP averaged over IoU thresholds 0.5:0.05:0.95 (AP@[.5:.95]) plus AP50, AP75—stricter than VOC AP50 only.
10 Mask AP? 📊 medium
Answer: AP computed on mask IoU instead of box IoU—segmentation quality can differ from bbox AP ranking.
11 80 classes? ⚡ easy
Answer: Thing categories for detection—plus stuff classes in panoptic/stuff annotations; don’t confuse with 91 legacy lists in some code.
12 train/val/test? 📊 medium
Answer: train2017, val2017 public; test-dev hidden for leaderboard—papers report val metrics for fair comparison.
13 pycocotools? 📊 medium
Answer: Official eval code for mAP, mask IoU, RLE decode—implementations should match to reproduce leaderboard numbers.
from pycocotools.coco import COCO; coco = COCO("annotations.json")
14 Small objects? 📊 medium
Answer: COCO reports AP_S/M/L by area—models struggle on small; anchors/FPN designs target scale variance.
15 iscrowd flag? 🔥 hard
Answer: Annotations for groups/crowds where instance separation ambiguous—evaluation rules ignore or merge per protocol.
16 Eval servers? ⚡ easy
Answer: Upload predictions for test sets—prevents test overfitting; val is for iteration.
17 Relation to LVIS? 📊 medium
Answer: Long-tail vocabulary extension—similar tooling, different frequency spectrum; training often joint with COCO.
18 Image source? ⚡ easy
Answer: Flickr-licensed photos of everyday scenes—more contextual clutter than ImageNet object-centric photos.
19 Why default benchmark? 📊 medium
Answer: Challenging scale, multi-task labels, standardized API—dominant for comparing detectors and instance seg models.
20 Common pitfalls? 📊 medium
Answer: Wrong bbox convention, ignoring iscrowd, different NMS thresholds, or not using official eval—numbers won’t match papers.

COCO Cheat Sheet

Det
  • AP@[.5:.95]
Seg
  • RLE masks
Tools
  • pycocotools

💡 Pro tip: COCO mAP is mean over IoU 0.5–0.95—not just AP50.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.

Previous Next