Computer Vision Chapter 2

Image processing basics

How a computer stores a picture: pixels on a grid, brightness and color channels, resolution, and tiny examples in Python so you can connect math to code. Verify snippets on your own machine—library versions differ.

Pixels and the image as a grid

A digital image is usually a rectangular grid of pixels (picture elements). Each cell holds one or more numeric values that describe color or intensity at that location. If the image is W pixels wide and H pixels tall, there are W × H spatial locations; each location may store one value (grayscale) or several (e.g. red, green, blue).

Think of row index going downward and column index going rightward—this matches how NumPy and OpenCV typically lay out 2D arrays (img[y, x] in Python, not [x, y]).

Toy 4×4 grayscale pattern

Below, each square is one pixel; the number is an 8-bit intensity (0 = black, 255 = white). Real images are much larger, but the idea is identical.

Same idea as a NumPy array (values illustrative):

import numpy as np

# Grayscale image: height 4, width 4, one channel
gray = np.array([
    [40,  80, 120, 160],
    [60, 100, 140, 180],
    [80, 120, 160, 200],
    [100, 140, 180, 220],
], dtype=np.uint8)

print(gray.shape)   # (4, 4)  →  rows, cols
print(gray[1, 2])   # row 1, col 2 → 140

Sampling and quantization

Sampling decides how many pixels you use in space: higher width/height means finer detail but more data. Quantization decides how many distinct values each measurement can take. For common 8-bit channels, each channel has 256 levels (0–255). Medical or HDR workflows may use 12–16 bits per channel for smoother intensity steps.

Together, sampling + quantization control quality vs size: a 12 MP photo with 8-bit RGB needs far more storage than a 320×240 thumbnail, and heavy quantization (e.g. posterization) creates visible bands in smooth regions.

Spatial resolution

Often written as width × height (e.g. 1920×1080). More pixels can reveal smaller structures; downsampling loses high-frequency detail.

Bit depth

Bits per channel per pixel. 8-bit is standard for display; scientific imaging may prefer higher depth to avoid clipping dark or bright regions.

Channels, resolution, and aspect ratio

Grayscale vs color

A grayscale image uses a single intensity channel. A typical RGB color image uses three channels (red, green, blue) per pixel. Some images add an alpha channel (RGBA) for transparency. In deep learning, tensors are often shaped (height, width, channels) or (channels, height, width) depending on the framework.

One RGB pixel

At position (y, x), you might read three bytes (R, G, B). Pure red could be (255, 0, 0); mid-gray in RGB is (128, 128, 128) (approximate perceptual gray).

# A single RGB pixel as a 1×1×3 array (H, W, C)
import numpy as np
pixel = np.array([[[255, 128, 0]]], dtype=np.uint8)  # orange-ish
print(pixel.shape)  # (1, 1, 3)

Resolution and aspect ratio

Resolution is the pixel count (or physical dots per inch when printing). Aspect ratio is width:height (e.g. 16:9). Resizing without preserving aspect ratio stretches objects; letterboxing or cropping changes composition—important for detection datasets.

Small examples: OpenCV and Pillow

These snippets assume you have installed opencv-python and Pillow in your environment. Replace "photo.jpg" with a real path on your disk.

OpenCV: shape, dtype, one pixel

OpenCV loads color images in BGR order by default (blue, green, red)—not RGB. That matters when you interpret channel indices or mix libraries.

import cv2

img = cv2.imread("photo.jpg")  # None if file missing
if img is not None:
    print(img.shape)   # (H, W) or (H, W, 3)
    print(img.dtype)   # usually uint8
    y, x = 10, 20
    b, g, r = img[y, x].tolist()
    print("BGR at (y,x):", b, g, r)

Convert BGR → RGB for libraries that expect RGB (e.g. Matplotlib): rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).

Pillow: size and mode

from PIL import Image

im = Image.open("photo.jpg")
print(im.size)    # (W, H)
print(im.mode)    # "RGB", "L" (grayscale), "RGBA", ...

gray = im.convert("L")
pixels = list(gray.getdata())  # flat sequence — use for tiny images only

For large images, prefer working with NumPy arrays via np.array(im) instead of materializing every pixel in a Python list.

File formats and memory (brief)

Lossless formats (e.g. PNG) preserve exact pixel values; lossy formats (e.g. JPEG) discard information to shrink file size—artifacts appear near edges and in skies. For training data, avoid re-saving labeled images repeatedly as JPEG if precision matters.

Raw image memory (order of magnitude): an H × W RGB uint8 image uses about H × W × 3 bytes before compression. A 3000×2000 RGB frame is roughly 18 MB in memory as a dense array.

Takeaways

  • Images are grids of pixels; indices are usually (row, col) or (y, x) in Python array APIs.
  • Grayscale = 1 channel; color often = 3 (RGB or BGR in OpenCV).
  • Sampling sets spatial detail; quantization sets intensity/color precision (often 8 bits per channel).
  • Always confirm channel order (BGR vs RGB) when mixing OpenCV with other tools.

Quick FAQ

Historical reasons and compatibility with older camera and display APIs. In code, convert explicitly when a library expects RGB.

Unsigned 8-bit integer: whole numbers from 0 to 255. It is the most common storage type for 8-bit display images in NumPy and OpenCV.

In OpenCV/NumPy shape, you get (height, width) or (height, width, channels). PIL’s Image.size returns (width, height). Read the docs for each API.