Word Break Problem Using Trie in Kotlin

The word break problem is a classic problem in computer science that involves breaking a given string into words using a dictionary. Given a string s and a dictionary of words, the goal of the word break problem is to determine whether s can be segmented into a space-separated sequence of one or more dictionary words.

For example, given the string “leetcode” and the dictionary [“leet”, “code”], we can segment the string into “leet code”, which consists of two words in the dictionary. However, if we are given the string “applepenapple” and the dictionary [“apple”, “pen”], we cannot segment the string into words in the dictionary.

Solving the word break problem can be useful in a variety of applications, such as natural language processing, text indexing, and text completion. It can be solved using dynamic programming or recursion with memoization.

The word break problem can be solved using a Trie data structure. Here’s an example Kotlin code that uses a Trie to solve the word break problem:

Kotlin
class TrieNode {
    val children = mutableMapOf<Char, TrieNode>()
    var isWord = false
}

class Trie {
    val root = TrieNode()
    
    fun insert(word: String) {
        var curr = root
        for (c in word) {
            curr = curr.children.getOrPut(c) { TrieNode() }
        }
        curr.isWord = true
    }
    
    fun search(word: String): Boolean {
        var curr = root
        for (c in word) {
            curr = curr.children[c] ?: return false
        }
        return curr.isWord
    }
}

fun wordBreak(s: String, wordDict: List<String>): Boolean {
    val trie = Trie()
    for (word in wordDict) {
        trie.insert(word)
    }
    
    val n = s.length
    val dp = BooleanArray(n + 1) { false }
    dp[0] = true
    
    for (i in 1..n) {
        var curr = trie.root
        for (j in i downTo 1) {
            curr = curr.children[s[j - 1]] ?: break
            if (curr.isWord && dp[j - 1]) {
                dp[i] = true
                break
            }
        }
    }
    
    return dp[0]
}
fun main() {
    val s = "kotlincode"
    val dict = listOf("kotlin", "code")
    val canSegment = wordBreak(s, dict)
    if (canSegment) {
        println("The string \"$s\" can be segmented into words in the dictionary.")
    } else {
        println("The string \"$s\" cannot be segmented into words in the dictionary.")
    }
}

The TrieNode class represents a node in the Trie data structure, which consists of a map of children nodes and a flag indicating whether the node represents the end of a word. The Trie class implements the Trie data structure with insertion and search operations.

The wordBreak function takes a string s and a list of words wordDict as input, and returns a boolean value indicating whether s can be segmented into words in wordDict. The function first creates a Trie and inserts all the words in wordDict into the Trie.

The function then initializes an array dp of Boolean values with length n + 1, where n is the length of the input string s. The value dp[i] represents whether the substring s[0..i-1] can be segmented into words in wordDict.

The function then iterates over each index i in the input string, and for each index, it iterates over each index j less than or equal to i. For each pair (i, j), the function checks if the substring s[j..i-1] is in the Trie and if dp[j - 1] is true. If both conditions are true, the function sets dp[i] to true and breaks out of the inner loop.

After the loops complete, the function returns the value dp[n], which indicates whether the entire input string s can be segmented into words in wordDict.

main function that uses the wordBreak function with the string “kotlincode” and a dictionary that contains the words “kotlin” and “code”,

In this example, we use the string “kotlincode” and a dictionary that contains the words “kotlin” and “code”. The wordBreak function is used to check whether the string can be segmented into words in the dictionary. The result is printed to the console depending on whether the string can be segmented or not.

To explain the example with a visual, we can create a Trie data structure that contains the words “kotlin” and “code”:

    root
     |
     k
     |
     o
     |
     t (isWord)
     |
     l
     |
     i
     |
     n (isWord)
     |
     c
     |
     o
     |
     d
     |
     e (isWord)

In this Trie, each node represents a prefix of a word in the dictionary, and the isWord flag is set to true at the last node of each word. To check whether the string “kotlincode” can be segmented into words in the dictionary, we can use the following dynamic programming table:

0 1 2 3 4 5 6 7 8 9 10


0 |T
1 |F
2 |F
3 |F
4 |F
5 |F
6 |F
7 |F
8 |F
9 |F
10|F

The rows of the table represent the indices of the input string, and the columns represent the indices of the previous substrings that have been checked. The value dp[i][j] is true if the substring s[j..i-1] can be segmented into words in the dictionary, and false otherwise.

The first row of the table is initialized to true because an empty substring can always be segmented. The remaining rows of the table are computed using the following recurrence relation:

dp[i][j] = dp[j - 1][0] && trie.contains(s[j..i-1])

This means that the substring s[j..i-1] can be segmented if the previous substring s[0..j-2] can be segmented (dp[j - 1][0]) and the substring s[j..i-1] is in the Trie (trie.contains(s[j..i-1])).

Using this recurrence relation, we can fill in the rest of the table as follows:

  0 1 2 3 4 5 6 7 8 9 10
  ----------------------
0|T
1|F T
2|F F T
3|F F F T
4|F F F F T
5|F F F F F T
6|F F F F F F T
7|F F F F F F F T
8|F F F F F F F F T
9|F F F F F F F F F T

Leave a Comment