Related Computer Vision Links
Learn Action Computer Vision Tutorial, validate concepts with Action Computer Vision MCQ Questions, and prepare interviews through Action Computer Vision Interview Questions and Answers.
Action Recognition MCQ
Classify what is happening in a clip—combine appearance and motion cues over time.
Human action
Classes
Clip
Spatiotemporal
Two-stream
RGB + flow
I3D
Inflated 3D
Actions in video
Action recognition assigns a label (e.g. diving, waving) to short clips. Methods range from frame CNNs + temporal pooling to two-stream RGB and optical-flow fusion, 3D convolutions (C3D, I3D), and transformers over space-time tokens. Large datasets (Kinetics) drive supervised pretraining.
Why motion matters
Static frames can be ambiguous; temporal patterns distinguish many action classes.
Key ideas
Clip input
Fixed-length segment sampled from longer video.
Two-stream
Separate nets for appearance and motion then fuse.
3D CNN
Spatiotemporal filters learn motion templates.
Kinetics
Large-scale labeled clips for pretraining.
Typical head
backbone features → temporal aggregate → softmax over action classes