Computer Vision Chapter 41

Optical character recognition

OCR turns pixels of text into Unicode strings. Pipelines often split into text detection (find word/line boxes) and recognition (read characters inside each crop). Clean scans benefit from classical engines like Tesseract; natural scene images need deep detectors (EAST, DB) and sequence recognizers (CRNN, transformers). Below: OpenCV preprocessing and pytesseract calls in Python.

Preprocess for Tesseract

import cv2

gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (3, 3), 0)
_, th = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# Optional: morphology to close gaps in strokes
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
th = cv2.morphologyEx(th, cv2.MORPH_CLOSE, kernel)

Deskew, denoise, and contrast normalization often improve line-based OCR more than raw color photos.

pytesseract

# pip install pytesseract; install Tesseract OCR binary on PATH
import pytesseract
from PIL import Image

text = pytesseract.image_to_string(Image.fromarray(th), lang="eng")
data = pytesseract.image_to_data(Image.fromarray(th), output_type=pytesseract.Output.DICT)

image_to_data returns per-word boxes and confidences for debugging.

Scene text (idea)

OpenCV’s DNN module can run frozen EAST text detection or ONNX recognition models: produce quadrilaterals or axis-aligned boxes, warp crops to fixed height, then run a recognition network. Training custom data yields better domain accuracy than generic English-only models.

Takeaways

  • Match engine to layout: printed forms vs street signs vs handwriting.
  • Evaluate with character/word accuracy on a held-out set.
  • Privacy: redact or avoid storing sensitive text without policy.

Quick FAQ

Install the matching tessdata pack and pass lang="hin+eng" (example) to pytesseract.

Tesseract is limited; use specialized HTR models or line-level sequence models trained on IAM-style corpora.