Word Break

The word break problem asks whether a string can be split into space-separated words from a dictionary — and optionally lists all valid segmentations. This lesson covers Word Break I (LeetCode 139), Word Break II (140), DP vs BFS, and full solutions in C, Java, and Python with trace tables.

Overview

Given a string s and a dictionary of words, determine if s can be built by concatenating dictionary words (each used as many times as needed unless stated otherwise).

Word break I pattern (boolean DP):
  dp[i] = True if s[0..i-1] can be segmented
  dp[0] = True   (empty prefix)

  for i = 1 .. n:
      for j = 0 .. i-1:
          if dp[j] and s[j..i) is in dictionary:
              dp[i] = True; break

  return dp[n]

Problem	Goal	Output	LeetCode
Word Break I	Can segment?	True / False	139
Word Break II	All valid segmentations	List of sentences	140
Word Break III	Segment with max words / min dict words	Count or optimize	Premium / interview
Concatenated words	Find substrings that are concat of dict words	List of indices	472

Word Break I — Can Segment?

Problem (LeetCode 139): Return true if s can be segmented into dictionary words.

State: dp[i] = whether prefix s[0..i−1] is segmentable.

Transition: If some split point j has dp[j]==true and substring s[j..i) is a dictionary word, set dp[i]=true.

Complexity: O(n² × L) naive substring checks; O(n²) with hash set if word length bounded.

Optimization: Store dictionary in a set for O(1) lookup. Optionally cap inner loop to max word length in dictionary.

Word Break I Code (C, Java, Python)

Keywords Types Functions Variables Strings Numbers Comments

Python

def word_break(s, wordDict):
    """LeetCode 139 — can s be segmented using dictionary words?"""
    word_set = set(wordDict)
    n = len(s)
    dp = [False] * (n + 1)
    dp[0] = True  # empty string is valid

    for i in range(1, n + 1):
        for j in range(i):
            # prefix s[0:j] ok AND s[j:i] is a dictionary word
            if dp[j] and s[j:i] in word_set:
                dp[i] = True
                break

    return dp[n]


if __name__ == "__main__":
    tests = [
        ("leetcode", ["leet", "code"]),
        ("applepenapple", ["apple", "pen"]),
        ("catsandog", ["cats", "dog", "sand", "and", "cat"]),
    ]
    for s, words in tests:
        print(f"wordBreak('{s}', {words}) → {word_break(s, words)}")

Java

import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class WordBreakI {

    public static boolean wordBreak(String s, List<String> wordDict) {
        Set<String> wordSet = new HashSet<>(wordDict);
        int n = s.length();
        boolean[] dp = new boolean[n + 1];
        dp[0] = true;

        for (int i = 1; i <= n; i++) {
            for (int j = 0; j < i; j++) {
                if (dp[j] && wordSet.contains(s.substring(j, i))) {
                    dp[i] = true;
                    break;
                }
            }
        }
        return dp[n];
    }

    public static void main(String[] args) {
        System.out.println(wordBreak("leetcode", List.of("leet", "code")));           // true
        System.out.println(wordBreak("applepenapple", List.of("apple", "pen")));      // true
        System.out.println(wordBreak("catsandog", List.of("cats", "dog", "sand", "and", "cat"))); // false
    }
}

C

#include <stdio.h>
#include <string.h>

#define MAX_S 1000
#define MAX_WORDS 1000
#define MAX_WLEN 32

/* Check if word exists in dictionary */
int in_dict(char *dict[], int dict_size, const char *word, int len) {
    int k;
    for (k = 0; k < dict_size; k++) {
        if ((int)strlen(dict[k]) == len && strncmp(dict[k], word, len) == 0)
            return 1;
    }
    return 0;
}

int word_break(char *s, char *dict[], int dict_size) {
    int dp[MAX_S + 1];
    int n = (int)strlen(s);
    int i, j;

    for (i = 0; i <= n; i++) dp[i] = 0;
    dp[0] = 1;

    for (i = 1; i <= n; i++) {
        for (j = 0; j < i; j++) {
            if (dp[j] && in_dict(dict, dict_size, s + j, i - j)) {
                dp[i] = 1;
                break;
            }
        }
    }
    return dp[n];
}

int main(void) {
    char *d1[] = {"leet", "code"};
    char *d2[] = {"apple", "pen"};
    char *d3[] = {"cats", "dog", "sand", "and", "cat"};
    printf("%d\n", word_break("leetcode", d1, 2));       /* 1 */
    printf("%d\n", word_break("applepenapple", d2, 2));  /* 1 */
    printf("%d\n", word_break("catsandog", d3, 5));     /* 0 */
    return 0;
}

Word Break I Output & Trace

Program output (all three languages)

Language	Input	Output	Valid segmentation
Python	"leetcode", ["leet","code"]	True	"leet code"
Python	"applepenapple", ["apple","pen"]	True	"apple pen apple"
Python	"catsandog", [...]	False	No valid split
Java / C	same tests	1, 1, 0	Same logic

DP trace — s = "leetcode", dict = {leet, code}

Index i	Prefix s[0..i−1]	dp[i]	Split j that worked	Word s[j..i)
0	""	True	—	Base case
1	"l"	False	—	Not in dict
2	"le"	False	—	Not in dict
3	"lee"	False	—	Not in dict
4	"leet"	True	j=0	"leet" ✓
5–7	"leetc"…"leetcod"	False	—	No valid extension
8	"leetcode"	True	j=4	"code" ✓ → answer True

Word Break II — All Sentences

Problem (LeetCode 140): Return every valid space-separated sentence that can be formed.

Approach: Run Word Break I DP first (or memo) to know reachable indices, then backtrack from index 0 collecting words.

Example: s = "catsanddog", dict = {cat, cats, and, sand, dog} → ["cats and dog", "cat sand dog"]

Word Break II Code (Python)

Word Break II is most readable in Python with memoized backtracking; Java/C follow the same pattern.

def word_break_ii(s, wordDict):
    """LeetCode 140 — return all valid segmentations."""
    word_set = set(wordDict)
    n = len(s)
    memo = {}

    def backtrack(start):
        if start == n:
            return [""]
        if start in memo:
            return memo[start]

        result = []
        for end in range(start + 1, n + 1):
            word = s[start:end]
            if word in word_set:
                for rest in backtrack(end):
                    sentence = word if rest == "" else word + " " + rest
                    result.append(sentence)
        memo[start] = result
        return result

    return backtrack(0)


if __name__ == "__main__":
    print(word_break_ii("catsanddog", ["cat", "cats", "and", "sand", "dog"]))
    # ['cats and dog', 'cat sand dog']

Word Break II output

Input	Output sentences	Count
"catsanddog"	["cats and dog", "cat sand dog"]	2
"pineapplepenapple"	["pine apple pen apple", "pineapple pen apple", "pine applepen apple"]	3

DP vs BFS vs Trie

Approach	Idea	Time	Best for
Boolean DP	`dp[i]` reachable prefixes	O(n²)	LeetCode 139 — standard
BFS	Queue of indices; edges = dict words	O(n²)	Same answer; graph view
Trie + DP/BFS	Trie for prefix pruning	O(n²) with better constants	Large dictionaries
Backtracking + memo	Build all sentences	Exponential output	LeetCode 140

Graph view: Indices 0…n are nodes; edge i→j exists if s[i..j) is in the dictionary. Word Break I = path from 0 to n.

More Examples

Example 1: Overlapping dictionary words

s	Dictionary	Word Break I	Why
aaaaaaa	["aaaa","aaa"]	True	Multiple overlapping splits work

Example 2: Greedy fails

s	Dictionary	Greedy longest match	DP answer
cars	["car","ca","rs"]	"car" then stuck on "s"	True via "ca" + "rs"

Example 3: Complexity summary

Problem	Time	Space	Notes
Word Break I (DP)	O(n²)	O(n)	n = \|s\|
Word Break II	O(2^n) worst	O(2^n) output	Can explode with many splits
With max word len W	O(n × W)	O(n)	Cap inner loop

Approach Comparison

Approach	Time	Space	When to use
1D boolean DP	O(n²)	O(n)	Default for Word Break I
BFS on indices	O(n²)	O(n)	When graph framing helps explain
Memoized backtrack	O(n² + output)	O(n²)	Word Break II
Naive recursion	O(2^n)	O(n)	Never in interviews

Real-Life Applications

Domain	Real scenario	Word break mapping	Variant
NLP — Chinese/Japanese	Segment text with no spaces into words	String + dictionary → valid splits	Word Break I/II
Speech / OCR	Join character stream into known words	DP over character indices	Boolean DP
URL / path parsing	Split compound tokens at dictionary boundaries	Segment camelCase or slugs	Custom dict
Bioinformatics	Parse sequences against known motifs	Motif dictionary segmentation	Similar DP
Keyboard / IME	Predict word boundaries while typing	Prefix reachable checks	Real-time DP
Compilers	Lexical analysis — tokenize source code	Longest-match vs full segmentation	Trie + automata
Data validation	Check if user input matches allowed phrase patterns	Dict = allowed tokens	Word Break I
Search suggestions	Compound query expansion	Generate all phrase splits (II)	Backtracking

Interview Tips

Clarify: return boolean only (139) or all sentences (140)? Words reusable?
Define dp[i] = can prefix s[0..i−1] be segmented?
Put dictionary in a HashSet before the DP loop.
Trace "leetcode" on whiteboard — show dp[4] and dp[8] becoming true.
Mention greedy failure ("cars" example) if asked why not greedy.
For II: DP prune + backtrack; warn about exponential output size.
Optional: BFS formulation — same complexity, different mental model.
Connect to edit distance and LCS as other string DP families.

Key Takeaway

Word Break I is 1D boolean DP: dp[i] is true if some j < i has dp[j] true and s[j..i) in the dictionary. Word Break II adds backtracking (with memo) from reachable indices. Think “path in a graph from 0 to n” — DP, BFS, and trie all segment the same idea.

Next up: Jump game · Edit distance · Subsequence problems