Image Processing Basics: 20 Essential Q&A

Question 1

1 What is a digital image in computer vision? ⚡ easy

Answer

Answer: A 2D (or 2D+channels) grid of samples where each cell is a pixel storing numeric intensity or color. It is a discrete approximation of a continuous scene after capture by a sensor and analog-to-digital conversion.

Question 2

2 What is a pixel? ⚡ easy

Answer

Answer: The smallest addressable element of a raster image. Each pixel holds one or more values (e.g. gray level or R,G,B). Spatially, pixels sit on a regular grid; physically, they correspond to sensor photosites plus processing (demosaicing for color cameras).

Question 3

3 Explain sampling and quantization. 📊 medium

Answer

Answer: Sampling chooses discrete spatial locations (grid resolution). Quantization maps continuous intensity to finite levels (bit depth). Together they convert a continuous image to digital form and introduce spatial and intensity approximation error.

Question 4

4 What is image resolution? ⚡ easy

Answer

Answer: Usually the grid size width × height in pixels (e.g. 1920×1080). Higher resolution preserves finer detail but costs memory and compute. Aspect ratio is width/height; changing resolution without preserving ratio stretches content.

Question 5

5 What are color channels? ⚡ easy

Answer

Answer: Separate 2D arrays (or stacked planes) per color component—commonly R, G, B for display. Grayscale has one channel. Multispectral/hyperspectral images have many bands beyond visible RGB.

Question 6

6 How is grayscale often computed from RGB? ⚡ easy

Answer

Answer: A weighted sum approximating luminance, e.g. 0.299R + 0.587G + 0.114B (ITU-R BT.601) or simpler averages for rough work. Weights reflect human sensitivity to green; the exact formula depends on standard and use case.

Question 7

7 What is bit depth? Why does it matter? 📊 medium

Answer

Answer: Bits per channel (e.g. 8-bit → 256 levels). Higher depth reduces banding and helps medical/raw workflows; 8-bit uint is standard for web and many CV datasets. HDR may use 16/32-bit float linear pipelines before tone mapping.

Question 8

8 How are pixel coordinates usually indexed? ⚡ easy

Answer

Answer: Often (row, col) or (y, x) with origin at top-left, row increasing downward—matching matrix indexing in NumPy/OpenCV. Be careful when converting to math coordinates where y may increase upward.

Question 9

9 What does tensor shape (H, W, C) mean? 📊 medium

Answer

Answer: Height (rows), width (columns), channels—typical for NumPy/OpenCV images. PyTorch often uses (N, C, H, W) for batches. Interviews check you can transpose between layouts without mixing H/W.

Question 10

10 Raster vs vector graphics? ⚡ easy

Answer

Answer: Raster: pixel grid (photos, textures). Vector: curves/paths (SVG, fonts)—infinite resolution until rasterized. CV pipelines usually consume raster tensors; vector assets are rasterized for learning.

Question 11

11 When choose JPEG vs PNG? ⚡ easy

Answer

Answer: JPEG: photos, smaller files, lossy, poor for sharp edges/text. PNG: lossless, transparency, screenshots and graphics. For repeated ML saves, beware JPEG compression artifacts affecting edges and noise.

Question 12

12 What problems can lossy compression cause for CV? 📊 medium

Answer

Answer: Blocking, ringing, color bleeding—especially around edges. Models may overfit artifact patterns. For training data, prefer lossless or high-quality JPEG; for deployment, know your camera/codec pipeline.

Question 13

13 What is aliasing when downsampling? 📊 medium

Answer

Answer: High-frequency detail folds into low frequencies as moiré or jaggies if you shrink without low-pass filtering. Fix: blur then downsample or use good resampling (area interpolation for downscaling in OpenCV).

Question 14

14 Nearest-neighbor vs bilinear interpolation? 📊 medium

Answer

Answer: Nearest: fast, blocky, preserves original values. Bilinear: smooths using 4 neighbors, better for resizing/rotation but blurs fine detail. Bicubic is smoother still; choice affects augmentation and geometric transforms.

Question 15

15 Typical dtypes for images in NumPy? ⚡ easy

Answer

Answer: uint8 [0,255] most common. Float images may be [0,1] or [0,255] depending on library—always normalize consistently before math or neural nets.

Question 16

16 Why does OpenCV use BGR? ⚡ easy

Answer

Answer: Historical reasons; imread returns BGR order. Convert to RGB for matplotlib or PIL-centric code: cv2.cvtColor(img, cv2.COLOR_BGR2RGB). Mixing orders is a common interview “debugging” trap.

Question 17

17 What is the alpha channel? ⚡ easy

Answer

Answer: Per-pixel opacity for compositing (RGBA). Not always present. When loading to 3-channel models, you often drop alpha or premultiply RGB depending on graphics pipeline.

Question 18

18 What does an image histogram show? 📊 medium

Answer

Answer: The distribution of pixel intensities (per channel or gray). Useful for exposure diagnosis, thresholding intuition, and contrast enhancement—foundation for histogram equalization (covered in later chapters).

Question 19

19 How does a video relate to images? ⚡ easy

Answer

Answer: A sequence of frames (2D images) sampled in time with a frame rate (FPS). Temporal redundancy enables compression and tracking; many CV models treat frames independently at first.

Question 20

20 What is EXIF metadata? ⚡ easy

Answer

Answer: Embedded tags in JPEG/TIFF: orientation, camera settings, timestamp, GPS. The orientation tag can rotate images—some loaders ignore it, causing inconsistent training data; preprocess to canonical orientation.

Related Computer Vision Links

Image Processing Basics: 20 Essential Q&A

Quick Navigation

Image Basics Cheat Sheet

Representation

Quality

Code pitfalls

Full tutorial track