Word Break
The word break problem asks whether a string can be split into space-separated words from a dictionary — and optionally lists all valid segmentations. This lesson covers Word Break I (LeetCode 139), Word Break II (140), DP vs BFS, and full solutions in C, Java, and Python with trace tables.
Overview
Given a string s and a dictionary of words, determine if s can be built by concatenating dictionary words (each used as many times as needed unless stated otherwise).
Word break I pattern (boolean DP):
dp[i] = True if s[0..i-1] can be segmented
dp[0] = True (empty prefix)
for i = 1 .. n:
for j = 0 .. i-1:
if dp[j] and s[j..i) is in dictionary:
dp[i] = True; break
return dp[n]
| Problem | Goal | Output | LeetCode |
|---|---|---|---|
| Word Break I | Can segment? | True / False | 139 |
| Word Break II | All valid segmentations | List of sentences | 140 |
| Word Break III | Segment with max words / min dict words | Count or optimize | Premium / interview |
| Concatenated words | Find substrings that are concat of dict words | List of indices | 472 |
Word Break I — Can Segment?
Problem (LeetCode 139): Return true if s can be segmented into dictionary words.
State: dp[i] = whether prefix s[0..i−1] is segmentable.
Transition: If some split point j has dp[j]==true and substring s[j..i) is a dictionary word, set dp[i]=true.
Complexity: O(n² × L) naive substring checks; O(n²) with hash set if word length bounded.
set for O(1) lookup. Optionally cap inner loop to max word length in dictionary.
Word Break I Code (C, Java, Python)
Python
def word_break(s, wordDict):
"""LeetCode 139 — can s be segmented using dictionary words?"""
word_set = set(wordDict)
n = len(s)
dp = [False] * (n + 1)
dp[0] = True # empty string is valid
for i in range(1, n + 1):
for j in range(i):
# prefix s[0:j] ok AND s[j:i] is a dictionary word
if dp[j] and s[j:i] in word_set:
dp[i] = True
break
return dp[n]
if __name__ == "__main__":
tests = [
("leetcode", ["leet", "code"]),
("applepenapple", ["apple", "pen"]),
("catsandog", ["cats", "dog", "sand", "and", "cat"]),
]
for s, words in tests:
print(f"wordBreak('{s}', {words}) → {word_break(s, words)}")
Java
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class WordBreakI {
public static boolean wordBreak(String s, List<String> wordDict) {
Set<String> wordSet = new HashSet<>(wordDict);
int n = s.length();
boolean[] dp = new boolean[n + 1];
dp[0] = true;
for (int i = 1; i <= n; i++) {
for (int j = 0; j < i; j++) {
if (dp[j] && wordSet.contains(s.substring(j, i))) {
dp[i] = true;
break;
}
}
}
return dp[n];
}
public static void main(String[] args) {
System.out.println(wordBreak("leetcode", List.of("leet", "code"))); // true
System.out.println(wordBreak("applepenapple", List.of("apple", "pen"))); // true
System.out.println(wordBreak("catsandog", List.of("cats", "dog", "sand", "and", "cat"))); // false
}
}
C
#include <stdio.h>
#include <string.h>
#define MAX_S 1000
#define MAX_WORDS 1000
#define MAX_WLEN 32
/* Check if word exists in dictionary */
int in_dict(char *dict[], int dict_size, const char *word, int len) {
int k;
for (k = 0; k < dict_size; k++) {
if ((int)strlen(dict[k]) == len && strncmp(dict[k], word, len) == 0)
return 1;
}
return 0;
}
int word_break(char *s, char *dict[], int dict_size) {
int dp[MAX_S + 1];
int n = (int)strlen(s);
int i, j;
for (i = 0; i <= n; i++) dp[i] = 0;
dp[0] = 1;
for (i = 1; i <= n; i++) {
for (j = 0; j < i; j++) {
if (dp[j] && in_dict(dict, dict_size, s + j, i - j)) {
dp[i] = 1;
break;
}
}
}
return dp[n];
}
int main(void) {
char *d1[] = {"leet", "code"};
char *d2[] = {"apple", "pen"};
char *d3[] = {"cats", "dog", "sand", "and", "cat"};
printf("%d\n", word_break("leetcode", d1, 2)); /* 1 */
printf("%d\n", word_break("applepenapple", d2, 2)); /* 1 */
printf("%d\n", word_break("catsandog", d3, 5)); /* 0 */
return 0;
}
Word Break I Output & Trace
Program output (all three languages)
| Language | Input | Output | Valid segmentation |
|---|---|---|---|
| Python | "leetcode", ["leet","code"] | True | "leet code" |
| Python | "applepenapple", ["apple","pen"] | True | "apple pen apple" |
| Python | "catsandog", [...] | False | No valid split |
| Java / C | same tests | 1, 1, 0 | Same logic |
DP trace — s = "leetcode", dict = {leet, code}
| Index i | Prefix s[0..i−1] | dp[i] | Split j that worked | Word s[j..i) |
|---|---|---|---|---|
| 0 | "" | True | — | Base case |
| 1 | "l" | False | — | Not in dict |
| 2 | "le" | False | — | Not in dict |
| 3 | "lee" | False | — | Not in dict |
| 4 | "leet" | True | j=0 | "leet" ✓ |
| 5–7 | "leetc"…"leetcod" | False | — | No valid extension |
| 8 | "leetcode" | True | j=4 | "code" ✓ → answer True |
Word Break II — All Sentences
Problem (LeetCode 140): Return every valid space-separated sentence that can be formed.
Approach: Run Word Break I DP first (or memo) to know reachable indices, then backtrack from index 0 collecting words.
Example: s = "catsanddog", dict = {cat, cats, and, sand, dog} → ["cats and dog", "cat sand dog"]
Word Break II Code (Python)
Word Break II is most readable in Python with memoized backtracking; Java/C follow the same pattern.
def word_break_ii(s, wordDict):
"""LeetCode 140 — return all valid segmentations."""
word_set = set(wordDict)
n = len(s)
memo = {}
def backtrack(start):
if start == n:
return [""]
if start in memo:
return memo[start]
result = []
for end in range(start + 1, n + 1):
word = s[start:end]
if word in word_set:
for rest in backtrack(end):
sentence = word if rest == "" else word + " " + rest
result.append(sentence)
memo[start] = result
return result
return backtrack(0)
if __name__ == "__main__":
print(word_break_ii("catsanddog", ["cat", "cats", "and", "sand", "dog"]))
# ['cats and dog', 'cat sand dog']
Word Break II output
| Input | Output sentences | Count |
|---|---|---|
| "catsanddog" | ["cats and dog", "cat sand dog"] | 2 |
| "pineapplepenapple" | ["pine apple pen apple", "pineapple pen apple", "pine applepen apple"] | 3 |
DP vs BFS vs Trie
| Approach | Idea | Time | Best for |
|---|---|---|---|
| Boolean DP | dp[i] reachable prefixes | O(n²) | LeetCode 139 — standard |
| BFS | Queue of indices; edges = dict words | O(n²) | Same answer; graph view |
| Trie + DP/BFS | Trie for prefix pruning | O(n²) with better constants | Large dictionaries |
| Backtracking + memo | Build all sentences | Exponential output | LeetCode 140 |
s[i..j) is in the dictionary. Word Break I = path from 0 to n.
More Examples
Example 1: Overlapping dictionary words
| s | Dictionary | Word Break I | Why |
|---|---|---|---|
| aaaaaaa | ["aaaa","aaa"] | True | Multiple overlapping splits work |
Example 2: Greedy fails
| s | Dictionary | Greedy longest match | DP answer |
|---|---|---|---|
| cars | ["car","ca","rs"] | "car" then stuck on "s" | True via "ca" + "rs" |
Example 3: Complexity summary
| Problem | Time | Space | Notes |
|---|---|---|---|
| Word Break I (DP) | O(n²) | O(n) | n = |s| |
| Word Break II | O(2^n) worst | O(2^n) output | Can explode with many splits |
| With max word len W | O(n × W) | O(n) | Cap inner loop |
Approach Comparison
| Approach | Time | Space | When to use |
|---|---|---|---|
| 1D boolean DP | O(n²) | O(n) | Default for Word Break I |
| BFS on indices | O(n²) | O(n) | When graph framing helps explain |
| Memoized backtrack | O(n² + output) | O(n²) | Word Break II |
| Naive recursion | O(2^n) | O(n) | Never in interviews |
Real-Life Applications
| Domain | Real scenario | Word break mapping | Variant |
|---|---|---|---|
| NLP — Chinese/Japanese | Segment text with no spaces into words | String + dictionary → valid splits | Word Break I/II |
| Speech / OCR | Join character stream into known words | DP over character indices | Boolean DP |
| URL / path parsing | Split compound tokens at dictionary boundaries | Segment camelCase or slugs | Custom dict |
| Bioinformatics | Parse sequences against known motifs | Motif dictionary segmentation | Similar DP |
| Keyboard / IME | Predict word boundaries while typing | Prefix reachable checks | Real-time DP |
| Compilers | Lexical analysis — tokenize source code | Longest-match vs full segmentation | Trie + automata |
| Data validation | Check if user input matches allowed phrase patterns | Dict = allowed tokens | Word Break I |
| Search suggestions | Compound query expansion | Generate all phrase splits (II) | Backtracking |
Interview Tips
- Clarify: return boolean only (139) or all sentences (140)? Words reusable?
- Define
dp[i]= can prefixs[0..i−1]be segmented? - Put dictionary in a
HashSetbefore the DP loop. - Trace "leetcode" on whiteboard — show dp[4] and dp[8] becoming true.
- Mention greedy failure ("cars" example) if asked why not greedy.
- For II: DP prune + backtrack; warn about exponential output size.
- Optional: BFS formulation — same complexity, different mental model.
- Connect to edit distance and LCS as other string DP families.
Key Takeaway
Word Break I is 1D boolean DP: dp[i] is true if some j < i has dp[j] true and s[j..i) in the dictionary. Word Break II adds backtracking (with memo) from reachable indices. Think “path in a graph from 0 to n” — DP, BFS, and trie all segment the same idea.
Next up: Jump game · Edit distance · Subsequence problems