Edit Distance (Levenshtein)

The edit distance between two strings is the minimum number of single-character operations — insert, delete, or replace — to transform one into the other. This lesson covers the classic Levenshtein DP (LeetCode 72), variants, and full solutions in C, Java, and Python with trace tables.

Overview

Given strings word1 and word2, compute the minimum edits to make them equal. This is the foundation of spell-checking, DNA alignment, and fuzzy string matching.

State:  dp[i][j] = min edits to turn word1[0..i-1] into word2[0..j-1]

Base:   dp[i][0] = i   (delete i chars from word1 prefix)
        dp[0][j] = j   (insert j chars to match word2 prefix)

Match:   dp[i][j] = dp[i-1][j-1]
Mismatch: dp[i][j] = 1 + min(
              dp[i-1][j],     // delete from word1
              dp[i][j-1],     // insert into word1
              dp[i-1][j-1]    // replace
          )
Problem Allowed ops LeetCode
Edit distanceInsert, delete, replace (cost 1 each)72
Delete op for two stringsDelete only (no replace)583
One edit distanceAt most one operation?161
Min ASCII delete sumDelete only; sum char codes712

Edit Distance vs LCS

Aspect LCS Edit distance
On matchdp[i−1][j−1] + 1dp[i−1][j−1] (no cost)
On mismatchmax(top, left)1 + min(delete, insert, replace)
Base row/colAll zerosdp[i][0]=i, dp[0][j]=j
GoalMaximize common subsequenceMinimize edits to full equality
Connection: With insert/delete only (no replace), min edits = len(word1) + len(word2) − 2·LCS — see subsequence problems.

Levenshtein Distance — LeetCode 72

Problem: Return the minimum number of operations to convert word1 to word2.

Answer: dp[m][n] after filling the table.

Complexity: O(m × n) time, O(m × n) space (optimizable to O(n)).

Edit Distance Code (C, Java, Python)

Keywords Types Functions Variables Strings Numbers Comments

Python

def min_distance(word1, word2):
    """LeetCode 72 — Levenshtein edit distance. O(m × n)."""
    m, n = len(word1), len(word2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    # base: transform prefix to / from empty string
    for i in range(m + 1):
        dp[i][0] = i   # i deletions
    for j in range(n + 1):
        dp[0][j] = j   # j insertions

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]   # chars match — no edit
            else:
                dp[i][j] = 1 + min(
                    dp[i - 1][j],       # delete word1[i-1]
                    dp[i][j - 1],       # insert word2[j-1]
                    dp[i - 1][j - 1]    # replace word1[i-1]
                )
    return dp[m][n]


if __name__ == "__main__":
    tests = [("horse", "ros"), ("intention", "execution"), ("", "a")]
    for w1, w2 in tests:
        print(f"minDistance('{w1}', '{w2}') = {min_distance(w1, w2)}")

Java

public class EditDistance {

    /** Levenshtein distance — classic 2D DP. */
    public static int minDistance(String word1, String word2) {
        int m = word1.length(), n = word2.length();
        int[][] dp = new int[m + 1][n + 1];

        for (int i = 0; i <= m; i++) dp[i][0] = i;
        for (int j = 0; j <= n; j++) dp[0][j] = j;

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1];
                } else {
                    dp[i][j] = 1 + Math.min(
                        dp[i - 1][j],
                        Math.min(dp[i][j - 1], dp[i - 1][j - 1])
                    );
                }
            }
        }
        return dp[m][n];
    }

    public static void main(String[] args) {
        System.out.println(minDistance("horse", "ros"));           // 3
        System.out.println(minDistance("intention", "execution")); // 5
        System.out.println(minDistance("", "a"));                  // 1
    }
}

C

#include <stdio.h>
#include <string.h>

#define MAX 1001

int min3(int a, int b, int c) {
    if (a <= b && a <= c) return a;
    return b <= c ? b : c;
}

int edit_distance(char *word1, char *word2) {
    int dp[MAX][MAX];
    int m = (int)strlen(word1), n = (int)strlen(word2);
    int i, j;

    for (i = 0; i <= m; i++) dp[i][0] = i;
    for (j = 0; j <= n; j++) dp[0][j] = j;

    for (i = 1; i <= m; i++) {
        for (j = 1; j <= n; j++) {
            if (word1[i - 1] == word2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = 1 + min3(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]);
            }
        }
    }
    return dp[m][n];
}

int main(void) {
    printf("%d\n", edit_distance("horse", "ros"));           /* 3 */
    printf("%d\n", edit_distance("intention", "execution")); /* 5 */
    printf("%d\n", edit_distance("", "a"));                  /* 1 */
    return 0;
}

Output & Trace Tables

Program output (all three languages)

Language Input Output Explanation
Python"horse", "ros"3horse → rorse → rose → ros
Python"intention", "execution"5Classic textbook example
Python"", "a"1One insert into empty string
Java / Csame tests3, 5, 1Identical logic

DP table trace — word1 = "horse", word2 = "ros"

dp[i][j] "" r o s
""0123
h1123
o2212
r3222
s4332
e5443

Classic example — "kitten" → "sitting" (distance 3)

Step Operation Result string
Startkitten
1Replace k → ssitten
2Replace e → isittin
3Insert gsitting
Reading the table: First row/column counts insertions/deletions to reach empty prefix. Bottom-right cell 3 is the minimum edit distance for "horse" → "ros".

The Three Operations Explained

Operation DP move Meaning Example at (i,j)
Deletedp[i−1][j] + 1Remove word1[i−1]Drop extra char from source
Insertdp[i][j−1] + 1Add word2[j−1]Pad source to match target
Replacedp[i−1][j−1] + 1Swap mismatched charshorse: h→r at first position
Matchdp[i−1][j−1]Chars equal — freeBoth have 'r' at (r,r)

Edit Distance Variants

Variant Change to standard DP LeetCode
Delete only (two strings)On mismatch: min(dp[i−1][j], dp[i][j−1]) + 1 — no replace diagonal+1583
One edit distanceCheck if standard DP ≤ 1 or direct scan161
Weighted delete costAdd ASCII values instead of +1712
Follow word1 / word2Backtrack from dp[m][n] to print alignmentInterview follow-up

Space Optimization

Only the previous row is needed — reduce to O(n) space:

prev[j] and curr[j] for each row i = 1..m
  match:    curr[j] = prev[j-1]
  mismatch: curr[j] = 1 + min(prev[j], curr[j-1], prev[j-1])

Keep prevRow[j-1] before overwrite when computing curr[j] (replace case)
Reconstruction: To print the actual edit script, keep the full 2D table (or store direction pointers) and backtrack from (m, n).

More Examples

Example 1: Identical strings

word1word2DistanceWhy
abcabc0Diagonal path — all matches

Example 2: Completely different (same length)

word1word2DistanceWhy
abcdef3Three replaces (or mix of ops)

Example 3: Complexity summary

ApproachTimeSpaceNotes
2D DPO(m × n)O(m × n)Standard interview solution
1D rollingO(m × n)O(n)Space optimized
Top-down memoO(m × n)O(m × n) + stackNatural recursive explanation
Naive recursionO(3^max(m,n))O(m+n)Try all ops — too slow

Approach Comparison

Approach Time Space When to use
Bottom-up 2DO(m × n)O(m × n)Whiteboard + backtracking
Bottom-up 1DO(m × n)O(n)Final optimized code
Top-down memoO(m × n)O(m × n)Explain recurrence recursively first
LCS reductionO(m × n)O(m × n)Delete-only variant (583)

Real-Life Applications

Domain Real scenario Edit distance role Notes
Spell checkSuggest closest dictionary word to typoRank candidates by Levenshtein distanceThreshold ≤ 2 for short words
AutocorrectMobile keyboard fixes "teh" → "the"Min edits among lexicon entriesOften combined with frequency
BioinformaticsDNA/protein sequence alignmentWeighted edit ops (gap penalties)Needleman–Wunsch variant
Plagiarism / dedupNear-duplicate document detectionLow edit distance → similar textToken-level or char-level
Search enginesFuzzy matching for queries"Levenstein" still finds "Levenshtein"Production uses optimized indexes
OCR / speechCompare recognized output to lexiconPick min-distance correctionNoisy input cleanup
Version controlDiff two file linesEdit script between stringsRelated to Myers diff
Record linkageMatch names across databases ("Jon" vs "John")Fuzzy string similarity scoreThreshold tuning per domain

Interview Tips

  1. Initialize first row and column — dp[i][0]=i, dp[0][j]=j — before the nested loops.
  2. State dp[i][j] meaning clearly: edits for prefixes, not full strings.
  3. On mismatch, write all three options: delete, insert, replace.
  4. Trace "horse"/"ros" or "kitten"/"sitting" on a small table.
  5. Distinguish from LCS: LCS maximizes match; edit distance minimizes cost to full equality.
  6. Follow-up: O(n) space with one row; backtrack to list operations.
  7. Delete-only (583) = no replace on mismatch; answer ties to LCS length.
  8. State O(m × n) time and space explicitly.

Key Takeaway

Edit distance fills dp[i][j] with min edits between prefixes: free on match, else 1 + min(delete, insert, replace). Base row/column count insertions/deletions from empty string. Same 2D grid shape as LCS — but mismatch uses three-way min, not max of neighbors.

Next up: Word break · Subsequence problems · DP properties & complexity