Video processing: CV guide

OpenCV: read a file

import cv2

cap = cv2.VideoCapture("clip.mp4")
if not cap.isOpened():
    raise RuntimeError("cannot open video")

fps = cap.get(cv2.CAP_PROP_FPS)
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
n = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print(fps, w, h, n)

while True:
    ok, frame = cap.read()
    if not ok:
        break
    # frame: BGR uint8, shape (h, w, 3)

cap.release()

Always call release() (or use a context-style pattern) so file handles and camera devices are freed.

Frame index and seek

cap.set(cv2.CAP_PROP_POS_FRAMES, 120)  # jump to frame 120
ok, frame = cap.read()
ms = cap.get(cv2.CAP_PROP_POS_MSEC)     # position in ms (if available)

Seek accuracy depends on codec and container; keyframe-only seeking can land on the nearest keyframe.

Webcam and backend

cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)  # Windows: DirectShow optional
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
if not cap.isOpened():
    raise RuntimeError("camera not available")

Write MP4 (example)

fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out = cv2.VideoWriter("out.mp4", fourcc, 30.0, (640, 480))
# ... for each frame BGR:
# out.write(frame)
out.release()

Codec fourcc must match what your OpenCV build supports; on some systems avc1 or H264 works better than mp4v.

torchvision: `read_video`

from torchvision.io import read_video

video, audio, info = read_video("clip.mp4", start_pts=0, end_pts=4, pts_unit="sec")
# video: (T, H, W, C) uint8 in RGB order
print(video.shape, info)

Useful for training clips; for long files prefer decoders that stream frames to limit RAM.

                    Takeaways
                    BGR in OpenCV vs RGB in many deep models—convert with cv2.cvtColor when needed.
Temporal methods need consistent FPS or explicit timestamps.
Next: optical flow ties neighboring frames through motion fields.

                

Quick FAQ

Decoding and disk I/O dominate; use smaller resolution, skip frames, GPU decoders, or extract frames offline.

Pass cv2.CAP_PROP_CONVERT_RGB, 0 where supported, or cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) after read.

Related Computer Vision Links

OpenCV: read a file

Frame index and seek

Webcam and backend

Write MP4 (example)

torchvision: read_video

Takeaways

Quick FAQ

Why is read() slow?

Grayscale capture?

torchvision: `read_video`