Pose estimation: CV guide

COCO-17 keypoints (idea)

Order typically includes nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles. Each predicted point has (x, y) and often a confidence; low confidence means occlusion or out-of-frame. Connect pairs with a fixed edge list to render a skeleton.

MediaPipe Pose (Python)

# pip install mediapipe opencv-python
import cv2
import mediapipe as mp

mp_pose = mp.solutions.pose
mp_draw = mp.solutions.drawing_utils

img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
with mp_pose.Pose(static_image_mode=True) as pose:
    res = pose.process(img_rgb)
    if res.pose_landmarks:
        mp_draw.draw_landmarks(
            img_bgr, res.pose_landmarks, mp_pose.POSE_CONNECTIONS)

For video, set static_image_mode=False and reuse the same Pose instance across frames for smoother tracking.

OpenCV DNN (OpenPose-style)

OpenCV samples load Caffe/ONNX multi-branch models that output heatmaps and part affinity fields. You download the model files from the OpenCV GitHub wiki, run net.forward, then decode peaks and associate limbs—more code than MediaPipe but fully offline and customizable.

3D pose

Extends estimation to camera-centered 3D joint coordinates (monocular lifting, multi-view fusion, or depth sensors). Often couples with biomechanics or AR.

                    Takeaways
                    Normalize crops and augment data for robustness to scale and clothing.
Multi-person scenes need association (top-down boxes or bottom-up grouping).
Ethics: pose in public spaces raises consent and surveillance concerns.

                

Quick FAQ

Temporal smoothing (Kalman, exponential moving average) or higher input resolution often helps.

Models may hallucinate hidden joints; use confidence thresholds and temporal consistency checks.

Related Computer Vision Links

COCO-17 keypoints (idea)

MediaPipe Pose (Python)

OpenCV DNN (OpenPose-style)

3D pose

Takeaways

Quick FAQ

Jittery keypoints?

Occlusion?