Related Computer Vision Links
Learn Sort Computer Vision Tutorial, validate concepts with Sort Computer Vision MCQ Questions, and prepare interviews through Sort Computer Vision Interview Questions and Answers.
Computer Vision Interview
20 essential Q&A
Updated 2026
SORT
SORT & DeepSORT: 20 Essential Q&A
A fast MOT baseline—Kalman + Hungarian—and the Re-ID upgrade for crowded scenes.
~11 min read
20 questions
Advanced
HungarianKalmancosinecascade
Quick Navigation
1
What is SORT?
📊 medium
Answer: Simple online MOT: Kalman filter motion model + Hungarian assignment with IoU cost between predicted and detected boxes—very fast.
2
Steps each frame?
📊 medium
Answer: Predict all tracks → match detections to tracks by IoU → update matched with Kalman measurement → create new tracks for unmatched dets → delete stale tracks.
3
Why Hungarian?
⚡ easy
Answer: Optimal one-to-one assignment minimizing total cost—better than greedy max-IoU for competing hypotheses.
4
Cost matrix?
📊 medium
Answer: Often 1 − IoU or negative IoU with threshold—reject matches below IoU min (no assignment).
5
max_age / min_hits?
📊 medium
Answer: Delete track if unmatched for max_age frames; confirm birth only after min_hits to reduce spurious tracks from false positives.
6
What does DeepSORT add?
🔥 hard
Answer: CNN appearance embedding + cosine distance combined with motion Mahalanobis gate—reduces ID switches when IoU ambiguous.
7
Cosine metric learning?
📊 medium
Answer: Train embedding so same-ID images are closer than different-ID—used with triplet or classification losses on person crops.
8
Cascade matching in DeepSORT?
🔥 hard
Answer: First match high-confidence detections to tracks using appearance+motion; then lower-confidence in second stage—reduces clutter confusion.
9
Mahalanobis gate?
📊 medium
Answer: Reject association if innovation (z − Hx) is unlikely under predicted covariance—filters physically impossible jumps.
10
Descriptor dimension?
⚡ easy
Answer: Typical 128-D L2-normalized vector per detection crop—cosine distance = 1 − dot product.
11
Gallery of features?
📊 medium
Answer: Store recent embeddings per track for matching—manage length to balance memory and adaptability to appearance change.
12
Occlusion?
📊 medium
Answer: IoU fails when overlapping—appearance helps reacquire correct ID after split; still hard in dense crowds.
13
Why fast?
⚡ easy
Answer: Minimal overhead beyond detector—no heavy joint optimization per frame unlike some MHT approaches.
14
What is ByteTrack?
📊 medium
Answer: Also associates low-score detections in a second pass—recovers occluded objects SORT might drop.
15
BoT-SORT?
🔥 hard
Answer: Adds camera motion compensation + improved Re-ID—strong MOTChallenge scores.
16
Dense crowds?
📊 medium
Answer: IoU-only methods degrade—appearance, higher-order models, or transformer MOT help.
17
Train appearance?
📊 medium
Answer: On person re-ID datasets (Market1501, etc.) separate from detector—domain gap to target scene matters.
18
SORT limits?
⚡ easy
Answer: Assumes good detector; IoU association weak under fast motion / low FPS; camera motion not modeled in vanilla SORT.
19
vs joint detectors?
🔥 hard
Answer: TrackFormer / MOTR predict tracks end-to-end—no hand-crafted association but need more data and compute.
20
Production?
📊 medium
Answer: Match detector FPS; batch Re-ID CNN; tune thresholds per scene; log ID switches for QA.
SORT / DeepSORT Cheat Sheet
SORT
- Kalman + IoU
- Hungarian
DeepSORT
- Appearance
- Cascade
Follow-on
- ByteTrack
💡 Pro tip: DeepSORT adds appearance when IoU is not enough.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.