Related Computer Vision Links
Learn Ocr Computer Vision Tutorial, validate concepts with Ocr Computer Vision MCQ Questions, and prepare interviews through Ocr Computer Vision Interview Questions and Answers.
Computer Vision Interview
20 essential Q&A
Updated 2026
OCR
Optical Character Recognition: 20 Essential Q&A
From scanned documents to scene text—detection, reading order, and sequence models.
~11 min read
20 questions
Intermediate
detectionrecognitionCTCTesseract
Quick Navigation
1
What is OCR?
⚡ easy
Answer: Converting images of text into machine-encoded text—includes layout, detection, and reading order for documents or natural scenes.
2
Detection vs recognition?
📊 medium
Answer: Detection finds where text is (boxes/polygons); recognition reads what characters—often separate stages or unified models.
3
Scene text difficulties?
📊 medium
Answer: Arbitrary orientation, fonts, lighting, perspective, small size, and background clutter vs clean scanned pages.
4
How does Tesseract work (classic)?
📊 medium
Answer: Adaptive thresholding, connected components, line/word finding, then LSTM-based recognizer in modern versions—strong on clean scans.
text = pytesseract.image_to_string(img) # OCR API
5
Preprocessing?
⚡ easy
Answer: Deskew, denoise, binarization, contrast normalize—improves classical OCR; deep models learn invariances but still benefit from sane crops.
6
Character segmentation?
🔥 hard
Answer: Splitting cursive or touching characters is hard—sequence models avoid explicit per-char cuts via CTC or attention.
7
CRNN?
📊 medium
Answer: CNN feature extractor → RNN (e.g. BiLSTM) for sequence → CTC or attention—classic pipeline for curved/horizontal text lines.
8
What is CTC?
🔥 hard
Answer: Loss aligning variable-length outputs to labels without per-timestep alignment—blank symbol collapses repeats; fits OCR output length ≠ input width.
9
Attention decoders?
📊 medium
Answer: Autoregressive prediction with visual attention over feature map—handles irregular scripts; slower than CTC but flexible.
10
EAST / DB?
📊 medium
Answer: Single-shot detectors producing rotated boxes or shrink-based segmentation for text instances—fast scene-text detection.
11
What is ICDAR?
⚡ easy
Answer: Competition/benchmark series for document and scene text—standard mAP / edit-distance metrics across tasks.
12
Multilingual OCR?
📊 medium
Answer: Separate language models, script-specific normalizers, or Unicode output layer—training data must cover target scripts.
13
Document layout?
📊 medium
Answer: Tables, columns, reading order—needs layout analysis (Detectron-style or transformer LMs) beyond line OCR.
14
End-to-end OCR?
🔥 hard
Answer: One network predicts boxes and text together (e.g. some transformer detectors)—reduces error propagation between stages.
15
Synthetic data?
⚡ easy
Answer: Render text on random backgrounds for detection/recognition pretrain—domain gap to real photos needs finetune.
16
Metrics?
📊 medium
Answer: Character error rate (CER), word error rate (WER), normalized edit distance—detection uses IoU + transcription match (Hmean).
17
Handling blur/skew?
📊 medium
Answer: Super-resolution, rectification networks, or train with aggressive augmentations—geometric augment critical for robustness.
18
Handwriting?
🔥 hard
Answer: Higher intra-class variability—needs writer-independent features, larger datasets (IAM), often HMM/CTC or seq2seq.
19
Deployment?
⚡ easy
Answer: ONNX/TensorRT for speed; batch line images; language models for post-correction in search/product pipelines.
20
TrOCR-style?
📊 medium
Answer: Vision encoder + text decoder pretrained on large image-text—strong zero-shot/finetune on documents without classical pipeline.
OCR Cheat Sheet
Stages
- Detect → read
Sequence
- CRNN + CTC
- Attention
Metrics
- CER / WER
💡 Pro tip: Scene text needs strong detection; CTC avoids character cuts.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.