Huffman tree, also known as Huffman coding, is a data compression algorithm used to encode data in a way that reduces the amount of space required to store it. The algorithm was developed by David A. Huffman in 1952.
The idea behind Huffman coding is to use variable-length codes for different characters, with shorter codes used for more frequently occurring characters and longer codes used for less frequently occurring characters. This way, the total number of bits required to store the data can be reduced.
To create a Huffman tree, the frequency of occurrence of each character in the data is first calculated. The characters are then sorted based on their frequency, with the least frequent characters at the bottom. A binary tree is then constructed, with each leaf node representing a character and its frequency, and the internal nodes representing the sum of the frequencies of its child nodes.
Starting from the bottom of the tree, each leaf node is assigned a binary code by traversing up the tree to its root, where 0 is assigned to the left child and 1 to the right child. The resulting binary codes for each character are then concatenated to create the compressed data.
When the compressed data is decompressed, the binary codes are read one at a time and traversed down the Huffman tree until a leaf node is reached, at which point the corresponding character is output.
Kotlin code implements Huffman coding :
import java.util.*
class HuffmanNode(val char: Char, val freq: Int, val left: HuffmanNode? = null, val right: HuffmanNode? = null) : Comparable<HuffmanNode> {
override fun compareTo(other: HuffmanNode): Int {
return this.freq - other.freq
}
}
HuffmanNode
class represents a node in the Huffman tree. It has three properties: char
for the character represented by the node, freq
for the frequency of the character, and left
and right
for the left and right children of the node.
compareTo
function is overridden to enable comparison of HuffmanNode
objects based on their frequency.
fun buildHuffmanTree(charFreq: IntArray): HuffmanNode? {
val priorityQueue = PriorityQueue<HuffmanNode>()
for (i in 0 until charFreq.size) {
if (charFreq[i] > 0) {
priorityQueue.add(HuffmanNode(i.toChar(), charFreq[i]))
}
}
while (priorityQueue.size > 1) {
val left = priorityQueue.poll()
val right = priorityQueue.poll()
priorityQueue.add(HuffmanNode('\u0000', left.freq + right.freq, left, right))
}
return priorityQueue.poll()
}
buildHuffmanTree
function takes an array of character frequencies as input and returns the root node of the Huffman tree. It first creates a PriorityQueue
to store the HuffmanNode
objects. Then, for each non-zero frequency in the input, it creates a new HuffmanNode
object and adds it to the priority queue. Next, it repeatedly extracts the two nodes with the lowest frequency from the priority queue, creates a new parent node with their sum frequency, and adds the parent node back to the priority queue. This continues until there is only one node remaining in the priority queue, which is the root node of the Huffman tree.
fun buildCodeTable(root: HuffmanNode?, code: String, codeTable: MutableMap<Char, String>) {
if (root?.left == null && root?.right == null) {
codeTable[root?.char?:'\u0000'] = code
} else {
buildCodeTable(root.left, code + "0", codeTable)
buildCodeTable(root.right, code + "1", codeTable)
}
}
buildCodeTable
function takes the root node of a Huffman tree, a string code
representing the binary code for the current node, and a mutable map codeTable
to store the binary codes for each character. It recursively traverses the Huffman tree and stores the binary code for each leaf node in the codeTable
map. The binary code for a leaf node is obtained by concatenating the binary code for its parent node with either “0” or “1”, depending on whether the leaf node is a left or right child.
fun compress(input: String): String {
val charFreq = IntArray(256)
for (c in input) {
charFreq[c.toInt()]++
}
val root = buildHuffmanTree(charFreq)
val codeTable = mutableMapOf<Char, String>()
buildCodeTable(root, "", codeTable)
var output = ""
for (c in input) {
output += codeTable[c] ?: ""
}
return output
}
compress
function takes a string input
and returns a compressed binary string using Huffman coding. It first creates an array charFreq
to store the frequency of each character in the input string. It then calls buildHuffmanTree
to build a Huffman tree from the character frequencies and buildCodeTable
to generate a binary code for each character. Finally, it loops through the input string, looks up the binary code for each character in the codeTable
map, and concatenates them to the output binary string.
fun decompress(input: String, root: HuffmanNode?): String {
var output = ""
var node = root
for (c in input) {
node = if (c == '0') node?.left else node?.right
if (node?.left == null && node?.right == null) {
output += node?.char
node = root
}
}
return output
}
The decompress
function takes a compressed binary string input
and the root node of a Huffman tree, and returns the original uncompressed string. It loops through the input binary string and traverses the Huffman tree by following the path of “0” or “1” bits. When it reaches a leaf node, it adds the character represented by the node to the output string and resets the current node to the root node.
import java.util.*
class HuffmanNode(val char: Char, val freq: Int, val left: HuffmanNode? = null, val right: HuffmanNode? = null) : Comparable<HuffmanNode> {
override fun compareTo(other: HuffmanNode): Int {
return this.freq - other.freq
}
}
fun buildHuffmanTree(charFreq: IntArray): HuffmanNode? {
val priorityQueue = PriorityQueue<HuffmanNode>()
for (i in 0 until charFreq.size) {
if (charFreq[i] > 0) {
priorityQueue.add(HuffmanNode(i.toChar(), charFreq[i]))
}
}
while (priorityQueue.size > 1) {
val left = priorityQueue.poll()
val right = priorityQueue.poll()
priorityQueue.add(HuffmanNode('\u0000', left.freq + right.freq, left, right))
}
return priorityQueue.poll()
}
fun buildCodeTable(root: HuffmanNode?, code: String, codeTable: MutableMap<Char, String>) {
if (root?.left == null && root?.right == null) {
codeTable[root?.char?:'\u0000'] = code
} else {
buildCodeTable(root.left, code + "0", codeTable)
buildCodeTable(root.right, code + "1", codeTable)
}
}
fun compress(input: String): String {
val charFreq = IntArray(256)
for (c in input) {
charFreq[c.toInt()]++
}
val root = buildHuffmanTree(charFreq)
val codeTable = mutableMapOf<Char, String>()
buildCodeTable(root, "", codeTable)
var output = ""
for (c in input) {
output += codeTable[c] ?: ""
}
return output
}
fun decompress(input: String, root: HuffmanNode?): String {
var output = ""
var node = root
for (c in input) {
node = if (c == '0') node?.left else node?.right
if (node?.left == null && node?.right == null) {
output += node?.char
node = root
}
}
return output
}
fun main() {
val input = "Hello, world!"
val compressed = compress(input)
val root = buildHuffmanTree(IntArray(256).apply {
input.forEach { this[it.toInt()]++ }
})
val decompressed = decompress(compressed, root)
println("Input: $input")
println("Compressed: $compressed")
println("Decompressed: $decompressed")
}
In the main
function, a sample input string is compressed using Huffman coding, then the compressed binary string is decompressed back to the original string. The input, compressed, and decompressed strings are then printed to the console.