Chapter 5: Q30E (page 166)

The basic intuition behind Huffman’s algorithm, that frequent blocks should have short encodings and infrequent blocks should have long encodings, is also at work in English, where typical words like I, you, is, and, to, from, and so on are short, and rarely used words like velociraptor are longer.
However, words like fire!, help!, and run! are short not because they are frequent, but perhaps because time is precious in situations where they are used.
To make things theoretical, suppose we have a file composed of m different words, with frequencies $f_{1}, . . ., f_{m}$ . Suppose also that for the $i_{th}$ word, the cost per bit of encoding is $c_{i}$ . Thus, if we find a prefix-free code where the $i_{th}$ word has a codeword of length $I_{i}$ , then the total cost of the encoding will be localid="1659078764835" $\sum_{} f_{i} \cdot c_{i} \cdot l_{i}$ .
Show how to modify Huffman’s algorithm to find the prefix-free encoding of minimum total cost.

Short Answer

Expert verified

Combine the two trees with a short spectral region until you reach the root node. Then, with each character, verify the tree as well as traverse it to create the code word.

Step by step solution

Step 1: Minimum total cost of prefix-free encoding

Let's say we have a file containing "m" distinct terms and their accompanying frequency distributions. $f_{1}, f_{2}, . . . f_{m}$ and cost per bit for “ $i_{th}$ ” word is denoted as.

Step 2: Cost of prefix-free encoding

Huffman’s algorithm to find the cost of prefix-free encoding is given below:

$c o s t = \sum_{i = 1}^{m} f_{i} \cdot l_{i}$

Where,

“ $f_{1}$ ” denotes as frequency distribution of the “ $i_{th}$ ” word.

“ $I_{i}$ ” denotes as length of the “ $i_{th}$ ” word.

The cost of prefix-free encoding can also be represented as,

$c o s t = \sum_{i = 1}^{m} f_{i} \cdot l_{i}$

Where,

“ $f_{i} . c_{i}$ ” denotes as frequency of the “ $i_{th}$ ” word.

“ $I_{i}$ ” denotes as length of the “ $i_{th}$ ” word.

Step 3: Minimum cost of prefix-free encoding

Here, Huffman developed a greedy algorithm to solve this problem and produces the minimum cost prefix code. The generated code is called as Huffman encoding.

Huffman encoding is an easy method.

• Each tree contains only one node, which corresponds to each letter.

• Join the pair of tree with the small range of frequency until to get the root node.

• Then, check the tree for each character and traverse the tree to write the code word.

o Here, “0” bit for left side traversal of tree.

o “1” bit for right side traversal of tree.

Note that the depth of the leaf in “ $i_{th}$ ” word in a tree “ $d_{i}$ ” is equal to the length of the “ $i_{th}$ ” word “ $I_{i}$ ” with that leaf.

Then, the minimum cost of prefix-free encoding can also be represented as,

$cost = \sum_{i = 1}^{m} f_{i} \cdot d_{i}$

Where,

“localid="1658922807296" $f_{i}$ ” denotes as frequency distribution of the“localid="1658922791600" $i_{th}$ ” word.

“localid="1658922833155" $d_{i}$ ” denotes as depth of the“localid="1658922841716" $i_{th}$ ” word.

Therefore, the Huffman encoding is considered as the minimum cost of prefix-free encoding.

For example:

Construction of the Huffman encoding tree:

Inputs are,

• Character A with frequency of 31%, character C with frequency of 20%, character G with frequency of 9%, and character T with frequency of 40%.

• Huffman encoding is shown below and frequencies are shown in square brackets.