SORT & DeepSORT: 20 Essential Q&A

A fast MOT baseline—Kalman + Hungarian—and the Re-ID upgrade for crowded scenes.

~11 min read 20 questions Advanced

HungarianKalmancosinecascade

Quick Navigation

1 What is SORT? 📊 medium

Answer: Simple online MOT: Kalman filter motion model + Hungarian assignment with IoU cost between predicted and detected boxes—very fast.

2 Steps each frame? 📊 medium

Answer: Predict all tracks → match detections to tracks by IoU → update matched with Kalman measurement → create new tracks for unmatched dets → delete stale tracks.

3 Why Hungarian? ⚡ easy

Answer: Optimal one-to-one assignment minimizing total cost—better than greedy max-IoU for competing hypotheses.

4 Cost matrix? 📊 medium

Answer: Often 1 − IoU or negative IoU with threshold—reject matches below IoU min (no assignment).

5 max_age / min_hits? 📊 medium

Answer: Delete track if unmatched for max_age frames; confirm birth only after min_hits to reduce spurious tracks from false positives.

6 What does DeepSORT add? 🔥 hard

Answer: CNN appearance embedding + cosine distance combined with motion Mahalanobis gate—reduces ID switches when IoU ambiguous.

7 Cosine metric learning? 📊 medium

Answer: Train embedding so same-ID images are closer than different-ID—used with triplet or classification losses on person crops.

8 Cascade matching in DeepSORT? 🔥 hard

Answer: First match high-confidence detections to tracks using appearance+motion; then lower-confidence in second stage—reduces clutter confusion.

9 Mahalanobis gate? 📊 medium

Answer: Reject association if innovation (z − Hx) is unlikely under predicted covariance—filters physically impossible jumps.

10 Descriptor dimension? ⚡ easy

Answer: Typical 128-D L2-normalized vector per detection crop—cosine distance = 1 − dot product.

11 Gallery of features? 📊 medium

Answer: Store recent embeddings per track for matching—manage length to balance memory and adaptability to appearance change.

12 Occlusion? 📊 medium

Answer: IoU fails when overlapping—appearance helps reacquire correct ID after split; still hard in dense crowds.

13 Why fast? ⚡ easy

Answer: Minimal overhead beyond detector—no heavy joint optimization per frame unlike some MHT approaches.

14 What is ByteTrack? 📊 medium

Answer: Also associates low-score detections in a second pass—recovers occluded objects SORT might drop.

15 BoT-SORT? 🔥 hard

Answer: Adds camera motion compensation + improved Re-ID—strong MOTChallenge scores.

16 Dense crowds? 📊 medium

Answer: IoU-only methods degrade—appearance, higher-order models, or transformer MOT help.

17 Train appearance? 📊 medium

Answer: On person re-ID datasets (Market1501, etc.) separate from detector—domain gap to target scene matters.

18 SORT limits? ⚡ easy

Answer: Assumes good detector; IoU association weak under fast motion / low FPS; camera motion not modeled in vanilla SORT.

19 vs joint detectors? 🔥 hard

Answer: TrackFormer / MOTR predict tracks end-to-end—no hand-crafted association but need more data and compute.

20 Production? 📊 medium

Answer: Match detector FPS; batch Re-ID CNN; tune thresholds per scene; log ID switches for QA.

SORT / DeepSORT Cheat Sheet

SORT

Kalman + IoU
Hungarian

DeepSORT

Appearance
Cascade

Follow-on

ByteTrack

💡 Pro tip: DeepSORT adds appearance when IoU is not enough.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.

SORT Tutorial

Previous Next

Related Computer Vision Links