COCO-17 keypoints (idea)
Order typically includes nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles. Each predicted point has (x, y) and often a confidence; low confidence means occlusion or out-of-frame. Connect pairs with a fixed edge list to render a skeleton.
MediaPipe Pose (Python)
# pip install mediapipe opencv-python
import cv2
import mediapipe as mp
mp_pose = mp.solutions.pose
mp_draw = mp.solutions.drawing_utils
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
with mp_pose.Pose(static_image_mode=True) as pose:
res = pose.process(img_rgb)
if res.pose_landmarks:
mp_draw.draw_landmarks(
img_bgr, res.pose_landmarks, mp_pose.POSE_CONNECTIONS)
For video, set static_image_mode=False and reuse the same Pose instance across frames for smoother tracking.
OpenCV DNN (OpenPose-style)
OpenCV samples load Caffe/ONNX multi-branch models that output heatmaps and part affinity fields. You download the model files from the OpenCV GitHub wiki, run net.forward, then decode peaks and associate limbs—more code than MediaPipe but fully offline and customizable.
3D pose
Extends estimation to camera-centered 3D joint coordinates (monocular lifting, multi-view fusion, or depth sensors). Often couples with biomechanics or AR.
Takeaways
- Normalize crops and augment data for robustness to scale and clothing.
- Multi-person scenes need association (top-down boxes or bottom-up grouping).
- Ethics: pose in public spaces raises consent and surveillance concerns.