Video Processing MCQ 15 Questions
Time: ~25 mins Intermediate

Video Processing MCQ

Time adds another axis: sampling frames, modeling motion, and building representations for video tasks.

Easy: 5 Q Medium: 6 Q Hard: 4 Q
Frames

Time axis

Clip

T seconds

Motion

Temporal change

3D conv

Space–time

Vision meets time

Video is a sequence of frames (or volumetric data). Pipelines sample clips at a frame rate, optionally compute optical flow or use 3D convolutions / transformers over space-time tokens. Tasks include action recognition, detection in video, and generation.

Temporal context

Single frames may be ambiguous; neighboring frames disambiguate motion and actions.

Key ideas

Frame sampling

FPS, stride, and clip length trade compute vs motion cues.

3D convolution

Extends kernels over time and space in one op.

Two-stream

RGB path + optical-flow path fused for actions.

Memory

RNNs or attention aggregate time after per-frame CNN features.

Simple pipeline

decode clip → preprocess → temporal model → task head

Pro tip: Heavy videos: decode on-the-fly with workers; cache sparse clips for long-form.