Related Computer Vision Links
Learn Rcnn Computer Vision Tutorial, validate concepts with Rcnn Computer Vision MCQ Questions, and prepare interviews through Rcnn Computer Vision Interview Questions and Answers.
Computer Vision Interview
20 essential Q&A
Updated 2026
R-CNN
R-CNN Family: 20 Essential Q&A
From selective search to RPN and feature pyramids—the two-stage detector story.
~12 min read
20 questions
Advanced
RPNRoIFPNCascade
Quick Navigation
1
Original R-CNN steps?
📊 medium
Answer: Propose ~2k regions (selective search) → warp each → CNN features → SVM per class + bbox regressor—no shared conv per region → very slow.
2
Main bottleneck?
⚡ easy
Answer: Running CNN thousands of times per image on warped crops; also disk caching of features in early work.
3
What did Fast R-CNN fix?
📊 medium
Answer: Run CNN once on full image; project RoIs onto feature map → RoI pool to fixed size → heads—big speedup + end-to-end backprop.
4
How RoI pooling works?
📊 medium
Answer: Divide each RoI on feature map into H×W bins; max-pool each bin to fixed output—quantization loses subpixel alignment.
5
What is Faster R-CNN?
🔥 hard
Answer: Replaces selective search with RPN that shares full-image conv features—learned proposals, joint training with detector.
6
What does the RPN output?
🔥 hard
Answer: At each anchor location: objectness logits and box deltas to refine anchors—proposals passed to RoI head.
7
Anchor scales/aspect ratios?
📊 medium
Answer: Multiple templates per location cover different object shapes; k anchors per cell → many candidate boxes before filtering by score + NMS.
8
Losses in Faster R-CNN?
🔥 hard
Answer: RPN: binary CE for objectness + smooth L1 for box deltas on assigned anchors; detector head: multi-class CE + bbox regression on positive RoIs.
9
Why FPN?
🔥 hard
Answer: Semantic single high-level feature map is weak for small objects—FPN builds a top-down pyramid with lateral connections for multi-scale RoI features.
10
RoIAlign role?
📊 medium
Answer: Bilinear sample features at exact RoI locations—used in Mask R-CNN for alignment-sensitive mask prediction.
11
What is Cascade R-CNN?
🔥 hard
Answer: Sequence of detector stages with increasing IoU thresholds for positives—reduces overfitting to low-quality proposals and improves AP.
12
NMS placement?
⚡ easy
Answer: After RPN (proposal NMS) and usually after final class-specific boxes—removes duplicate detections.
13
Approximate joint training?
📊 medium
Answer: Alternating or 4-step training historically; modern implementations use single loss with shared backbone and careful sampling.
14
Two-stage strength?
⚡ easy
Answer: Typically higher mAP especially on challenging datasets vs comparable-era one-stage; slower inference.
15
Mask R-CNN?
📊 medium
Answer: Adds mask branch to Faster R-CNN with RoIAlign—instance segmentation with modest overhead.
16
Keypoint R-CNN?
📊 medium
Answer: Same framework with one-hot masks per keypoint or heatmap head—used for pose.
17
Deformable conv in detectors?
🔥 hard
Answer: Offsets sampling grid in conv—better geometric modeling for deformable objects; used in RefineDet / DCN backbones.
18
What is HTC?
🔥 hard
Answer: Hybrid Task Cascade—interleaves detection and segmentation stages with feature fusion—strong COCO instance segmentation.
19
DETR vs R-CNN?
📊 medium
Answer: DETR removes anchors/NMS with transformers—simpler pipeline but different training dynamics and compute.
20
When choose two-stage today?
⚡ easy
Answer: When max accuracy matters and latency budget allows, or when building on mature frameworks (Detectron2) with many pretrained configs.
R-CNN Family Cheat Sheet
Evolution
- R-CNN → Fast
- Faster + RPN
Add-ons
- FPN
- RoIAlign
Accuracy
- Cascade
- HTC
đź’ˇ Pro tip: Faster R-CNN = shared backbone + RPN proposals + RoI head.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.