Word Break

The word break problem asks whether a string can be split into space-separated words from a dictionary — and optionally lists all valid segmentations. This lesson covers Word Break I (LeetCode 139), Word Break II (140), DP vs BFS, and full solutions in C, Java, and Python with trace tables.

Overview

Given a string s and a dictionary of words, determine if s can be built by concatenating dictionary words (each used as many times as needed unless stated otherwise).

Word break I pattern (boolean DP):
  dp[i] = True if s[0..i-1] can be segmented
  dp[0] = True   (empty prefix)

  for i = 1 .. n:
      for j = 0 .. i-1:
          if dp[j] and s[j..i) is in dictionary:
              dp[i] = True; break

  return dp[n]
Problem Goal Output LeetCode
Word Break ICan segment?True / False139
Word Break IIAll valid segmentationsList of sentences140
Word Break IIISegment with max words / min dict wordsCount or optimizePremium / interview
Concatenated wordsFind substrings that are concat of dict wordsList of indices472

Word Break I — Can Segment?

Problem (LeetCode 139): Return true if s can be segmented into dictionary words.

State: dp[i] = whether prefix s[0..i−1] is segmentable.

Transition: If some split point j has dp[j]==true and substring s[j..i) is a dictionary word, set dp[i]=true.

Complexity: O(n² × L) naive substring checks; O(n²) with hash set if word length bounded.

Optimization: Store dictionary in a set for O(1) lookup. Optionally cap inner loop to max word length in dictionary.

Word Break I Code (C, Java, Python)

Keywords Types Functions Variables Strings Numbers Comments

Python

def word_break(s, wordDict):
    """LeetCode 139 — can s be segmented using dictionary words?"""
    word_set = set(wordDict)
    n = len(s)
    dp = [False] * (n + 1)
    dp[0] = True  # empty string is valid

    for i in range(1, n + 1):
        for j in range(i):
            # prefix s[0:j] ok AND s[j:i] is a dictionary word
            if dp[j] and s[j:i] in word_set:
                dp[i] = True
                break

    return dp[n]


if __name__ == "__main__":
    tests = [
        ("leetcode", ["leet", "code"]),
        ("applepenapple", ["apple", "pen"]),
        ("catsandog", ["cats", "dog", "sand", "and", "cat"]),
    ]
    for s, words in tests:
        print(f"wordBreak('{s}', {words}) → {word_break(s, words)}")

Java

import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class WordBreakI {

    public static boolean wordBreak(String s, List<String> wordDict) {
        Set<String> wordSet = new HashSet<>(wordDict);
        int n = s.length();
        boolean[] dp = new boolean[n + 1];
        dp[0] = true;

        for (int i = 1; i <= n; i++) {
            for (int j = 0; j < i; j++) {
                if (dp[j] && wordSet.contains(s.substring(j, i))) {
                    dp[i] = true;
                    break;
                }
            }
        }
        return dp[n];
    }

    public static void main(String[] args) {
        System.out.println(wordBreak("leetcode", List.of("leet", "code")));           // true
        System.out.println(wordBreak("applepenapple", List.of("apple", "pen")));      // true
        System.out.println(wordBreak("catsandog", List.of("cats", "dog", "sand", "and", "cat"))); // false
    }
}

C

#include <stdio.h>
#include <string.h>

#define MAX_S 1000
#define MAX_WORDS 1000
#define MAX_WLEN 32

/* Check if word exists in dictionary */
int in_dict(char *dict[], int dict_size, const char *word, int len) {
    int k;
    for (k = 0; k < dict_size; k++) {
        if ((int)strlen(dict[k]) == len && strncmp(dict[k], word, len) == 0)
            return 1;
    }
    return 0;
}

int word_break(char *s, char *dict[], int dict_size) {
    int dp[MAX_S + 1];
    int n = (int)strlen(s);
    int i, j;

    for (i = 0; i <= n; i++) dp[i] = 0;
    dp[0] = 1;

    for (i = 1; i <= n; i++) {
        for (j = 0; j < i; j++) {
            if (dp[j] && in_dict(dict, dict_size, s + j, i - j)) {
                dp[i] = 1;
                break;
            }
        }
    }
    return dp[n];
}

int main(void) {
    char *d1[] = {"leet", "code"};
    char *d2[] = {"apple", "pen"};
    char *d3[] = {"cats", "dog", "sand", "and", "cat"};
    printf("%d\n", word_break("leetcode", d1, 2));       /* 1 */
    printf("%d\n", word_break("applepenapple", d2, 2));  /* 1 */
    printf("%d\n", word_break("catsandog", d3, 5));     /* 0 */
    return 0;
}

Word Break I Output & Trace

Program output (all three languages)

Language Input Output Valid segmentation
Python"leetcode", ["leet","code"]True"leet code"
Python"applepenapple", ["apple","pen"]True"apple pen apple"
Python"catsandog", [...]FalseNo valid split
Java / Csame tests1, 1, 0Same logic

DP trace — s = "leetcode", dict = {leet, code}

Index i Prefix s[0..i−1] dp[i] Split j that worked Word s[j..i)
0""TrueBase case
1"l"FalseNot in dict
2"le"FalseNot in dict
3"lee"FalseNot in dict
4"leet"Truej=0"leet" ✓
5–7"leetc"…"leetcod"FalseNo valid extension
8"leetcode"Truej=4"code" ✓ → answer True

Word Break II — All Sentences

Problem (LeetCode 140): Return every valid space-separated sentence that can be formed.

Approach: Run Word Break I DP first (or memo) to know reachable indices, then backtrack from index 0 collecting words.

Example: s = "catsanddog", dict = {cat, cats, and, sand, dog} → ["cats and dog", "cat sand dog"]

Word Break II Code (Python)

Word Break II is most readable in Python with memoized backtracking; Java/C follow the same pattern.

def word_break_ii(s, wordDict):
    """LeetCode 140 — return all valid segmentations."""
    word_set = set(wordDict)
    n = len(s)
    memo = {}

    def backtrack(start):
        if start == n:
            return [""]
        if start in memo:
            return memo[start]

        result = []
        for end in range(start + 1, n + 1):
            word = s[start:end]
            if word in word_set:
                for rest in backtrack(end):
                    sentence = word if rest == "" else word + " " + rest
                    result.append(sentence)
        memo[start] = result
        return result

    return backtrack(0)


if __name__ == "__main__":
    print(word_break_ii("catsanddog", ["cat", "cats", "and", "sand", "dog"]))
    # ['cats and dog', 'cat sand dog']

Word Break II output

Input Output sentences Count
"catsanddog"["cats and dog", "cat sand dog"]2
"pineapplepenapple"["pine apple pen apple", "pineapple pen apple", "pine applepen apple"]3

DP vs BFS vs Trie

Approach Idea Time Best for
Boolean DPdp[i] reachable prefixesO(n²)LeetCode 139 — standard
BFSQueue of indices; edges = dict wordsO(n²)Same answer; graph view
Trie + DP/BFSTrie for prefix pruningO(n²) with better constantsLarge dictionaries
Backtracking + memoBuild all sentencesExponential outputLeetCode 140
Graph view: Indices 0…n are nodes; edge i→j exists if s[i..j) is in the dictionary. Word Break I = path from 0 to n.

More Examples

Example 1: Overlapping dictionary words

sDictionaryWord Break IWhy
aaaaaaa["aaaa","aaa"]TrueMultiple overlapping splits work

Example 2: Greedy fails

sDictionaryGreedy longest matchDP answer
cars["car","ca","rs"]"car" then stuck on "s"True via "ca" + "rs"

Example 3: Complexity summary

ProblemTimeSpaceNotes
Word Break I (DP)O(n²)O(n)n = |s|
Word Break IIO(2^n) worstO(2^n) outputCan explode with many splits
With max word len WO(n × W)O(n)Cap inner loop

Approach Comparison

Approach Time Space When to use
1D boolean DPO(n²)O(n)Default for Word Break I
BFS on indicesO(n²)O(n)When graph framing helps explain
Memoized backtrackO(n² + output)O(n²)Word Break II
Naive recursionO(2^n)O(n)Never in interviews

Real-Life Applications

Domain Real scenario Word break mapping Variant
NLP — Chinese/JapaneseSegment text with no spaces into wordsString + dictionary → valid splitsWord Break I/II
Speech / OCRJoin character stream into known wordsDP over character indicesBoolean DP
URL / path parsingSplit compound tokens at dictionary boundariesSegment camelCase or slugsCustom dict
BioinformaticsParse sequences against known motifsMotif dictionary segmentationSimilar DP
Keyboard / IMEPredict word boundaries while typingPrefix reachable checksReal-time DP
CompilersLexical analysis — tokenize source codeLongest-match vs full segmentationTrie + automata
Data validationCheck if user input matches allowed phrase patternsDict = allowed tokensWord Break I
Search suggestionsCompound query expansionGenerate all phrase splits (II)Backtracking

Interview Tips

  1. Clarify: return boolean only (139) or all sentences (140)? Words reusable?
  2. Define dp[i] = can prefix s[0..i−1] be segmented?
  3. Put dictionary in a HashSet before the DP loop.
  4. Trace "leetcode" on whiteboard — show dp[4] and dp[8] becoming true.
  5. Mention greedy failure ("cars" example) if asked why not greedy.
  6. For II: DP prune + backtrack; warn about exponential output size.
  7. Optional: BFS formulation — same complexity, different mental model.
  8. Connect to edit distance and LCS as other string DP families.

Key Takeaway

Word Break I is 1D boolean DP: dp[i] is true if some j < i has dp[j] true and s[j..i) in the dictionary. Word Break II adds backtracking (with memo) from reachable indices. Think “path in a graph from 0 to n” — DP, BFS, and trie all segment the same idea.

Next up: Jump game · Edit distance · Subsequence problems