Related Computer Vision Links
Learn 3D Computer Vision Tutorial, validate concepts with 3D Computer Vision MCQ Questions, and prepare interviews through 3D Computer Vision Interview Questions and Answers.
Computer Vision Interview
20 essential Q&A
Updated 2026
3D vision
3D Vision Introduction: 20 Essential Q&A
From 2D images to depth and geometry—stereo, monocular cues, and 3D representations.
~11 min read
20 questions
Advanced
depthstereopoint cloudpinhole
Quick Navigation
1
What is 3D computer vision?
⚡ easy
Answer: Reasoning about geometry of the scene—depth, shape, pose, and 3D structure—from images, video, or range sensors.
2
Depth from stereo?
📊 medium
Answer: Triangulate corresponding points in two calibrated views—baseline provides parallax; disparity inversely related to depth.
3
Define disparity.
📊 medium
Answer: Horizontal shift between conjugate pixels in rectified stereo pair—larger disparity means closer object (for standard forward stereo).
4
What is the epipolar constraint?
🔥 hard
Answer: Corresponding point in second image lies on a line (epipolar line)—reduces matching from 2D search to 1D after rectification.
5
Monocular depth?
📊 medium
Answer: Uses cues (perspective, texture, learned priors) or supervised/self-supervised CNNs—scale ambiguous without extra info.
6
What is a point cloud?
⚡ easy
Answer: Set of 3D points (x,y,z), often with color/normal—raw output of LiDAR/stereo fusion or depth cameras.
7
Voxel vs mesh?
📊 medium
Answer: Voxel grid discretizes 3D space—good for conv nets; mesh stores vertices+faces—compact for graphics and surface reasoning.
8
Pinhole camera model?
📊 medium
Answer: Projects 3D X to image x via similar triangles: x = K [R|t] X (homogeneous)—basis for calibration and triangulation.
9
Intrinsic matrix K?
📊 medium
Answer: Maps camera coordinates to pixels: focal lengths f_x,f_y and principal point c_x,c_y; may include skew in general form.
10
Extrinsics?
📊 medium
Answer: Rotation R and translation t from world to camera frame—pose of camera in scene.
11
RGB-D cameras?
⚡ easy
Answer: Structured light or time-of-flight provides registered depth + color (Kinect, RealSense)—no stereo baseline needed but range/artifact limits.
12
LiDAR?
📊 medium
Answer: Active ranging by laser pulses—sparse accurate 3D, widely used in autonomy; different noise profile than passive stereo.
13
Structure from motion?
📊 medium
Answer: Estimate sparse 3D points and camera poses from many images—basis of photogrammetry pipelines.
14
SLAM in one line?
📊 medium
Answer: Simultaneously localize sensor and build map of environment—needs data association and loop closure.
15
What is NeRF?
🔥 hard
Answer: Neural radiance field represents scene as MLP of density+color in 5D (x,y,z,θ,φ)—novel view synthesis; hot research direction.
16
Scale ambiguity?
📊 medium
Answer: Monocular SfM/SLAM recovers geometry up to similarity transform without metric scale—IMU or known object fixes scale.
17
What is ICP?
📊 medium
Answer: Iterative Closest Point aligns two point clouds by minimizing distances between correspondences—registration and tracking.
18
BEV representation?
📊 medium
Answer: Top-down grid of scene used in driving—fuses multi-view or LiDAR into 2D bird’s-eye feature maps for detection/planning.
19
Applications?
⚡ easy
Answer: AR overlay needs 6-DoF pose; robotics needs grasp planning collision checking—both need reliable 3D perception.
20
Example datasets?
⚡ easy
Answer: KITTI, nuScenes, ScanNet, ShapeNet—each emphasizes driving, multi-sensor, indoor scans, or CAD models respectively.
3D Vision Cheat Sheet
Stereo
- Disparity → depth
- Epipolar line
Models
- K, [R|t]
- Pinhole
Data
- Point cloud
- RGB-D / LiDAR
💡 Pro tip: Stereo needs calibration + rectification for 1D disparity search.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.