OpenCV: read a file
import cv2
cap = cv2.VideoCapture("clip.mp4")
if not cap.isOpened():
raise RuntimeError("cannot open video")
fps = cap.get(cv2.CAP_PROP_FPS)
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
n = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print(fps, w, h, n)
while True:
ok, frame = cap.read()
if not ok:
break
# frame: BGR uint8, shape (h, w, 3)
cap.release()
Always call release() (or use a context-style pattern) so file handles and camera devices are freed.
Frame index and seek
cap.set(cv2.CAP_PROP_POS_FRAMES, 120) # jump to frame 120
ok, frame = cap.read()
ms = cap.get(cv2.CAP_PROP_POS_MSEC) # position in ms (if available)
Seek accuracy depends on codec and container; keyframe-only seeking can land on the nearest keyframe.
Webcam and backend
cap = cv2.VideoCapture(0, cv2.CAP_DSHOW) # Windows: DirectShow optional
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
if not cap.isOpened():
raise RuntimeError("camera not available")
Write MP4 (example)
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out = cv2.VideoWriter("out.mp4", fourcc, 30.0, (640, 480))
# ... for each frame BGR:
# out.write(frame)
out.release()
Codec fourcc must match what your OpenCV build supports; on some systems avc1 or H264 works better than mp4v.
torchvision: read_video
from torchvision.io import read_video
video, audio, info = read_video("clip.mp4", start_pts=0, end_pts=4, pts_unit="sec")
# video: (T, H, W, C) uint8 in RGB order
print(video.shape, info)
Useful for training clips; for long files prefer decoders that stream frames to limit RAM.
Takeaways
- BGR in OpenCV vs RGB in many deep models—convert with
cv2.cvtColorwhen needed. - Temporal methods need consistent FPS or explicit timestamps.
- Next: optical flow ties neighboring frames through motion fields.
Quick FAQ
cv2.CAP_PROP_CONVERT_RGB, 0 where supported, or cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) after read.