Computer Vision Chapter 39

Face recognition

A typical face recognition system has stages: detect face boxes, align (similarity transform to canonical landmarks), encode to a compact embedding (e.g. 512-D unit vector), then compare embeddings for verification (same person?) or search a gallery for identification. Deep models (ArcFace, CosFace, modern transformers) train with metric-learning losses so same-identity pairs are close in cosine distance. Below: classic OpenCV Haar detection and PyTorch cosine verification on arbitrary embedding tensors.

Face detection: Haar cascade

import cv2

face_cascade = cv2.CascadeClassifier(
    cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
)
gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(40, 40))
for (x, y, w, h) in faces:
    cv2.rectangle(img_bgr, (x, y), (x + w, y + h), (0, 255, 0), 2)

Fast and dependency-light; less robust than modern CNN detectors in hard lighting or pose.

DNN face detector (OpenCV)

Download Caffe or TensorFlow/OpenCV zoo models (e.g. single-shot detector variants). Load with cv2.dnn.readNetFromCaffe or readNetFromTensorflow, build a blob from the image, net.setInput, forward, then decode boxes and NMS.

net = cv2.dnn.readNetFromTensorflow("opencv_face_detector_uint8.pb",
                                    "opencv_face_detector.pbtxt")
h, w = img_bgr.shape[:2]
blob = cv2.dnn.blobFromImage(img_bgr, 1.0, (300, 300), [104, 117, 123])
net.setInput(blob)
detections = net.forward()
# iterate detections[0,0,i,:] — confidence, box coords — apply threshold + NMS

Exact decoding depends on the model’s output layout; see OpenCV samples for the matching version.

Embeddings (concept)

Crop the face, resize to the network input (often 112×112), run the backbone + embedding head. L2-normalize vectors so cosine similarity equals dot product.

Verification: cosine similarity

import torch
import torch.nn.functional as F

def l2n(x):
    return F.normalize(x, dim=1)

# e1, e2: [1, D] from your face encoder
sim = (l2n(e1) * l2n(e2)).sum(dim=1)
same_person = sim > 0.35  # threshold is model- and dataset-specific

Takeaways

  • Detection quality limits end-to-end accuracy—align before encoding when possible.
  • Use calibrated thresholds; report FAR/FRR for security-sensitive use.
  • Privacy: biometrics need consent, secure storage, and compliance (e.g. GDPR).

Quick FAQ

Liveness detection (texture, depth, challenge-response) blocks printed or screen replay attacks.

Verification: one-to-one (probe vs claimed identity). Identification: one-to-many search over a gallery (nearest neighbor in embedding space).