The basic intuition behind Huffman’s algorithm, that frequent blocks should have short encodings and infrequent blocks should have long encodings, is also at work in English, where typical words like I, you, is, and, to, from, and so on are short, and rarely used words like velociraptor are longer.

However, words like fire!, help!, and run! are short not because they are frequent, but perhaps because time is precious in situations where they are used.

To make things theoretical, suppose we have a file composed of m different words, with frequencies f1,...,fm. Suppose also that for the ithword, the cost per bit of encoding is ci. Thus, if we find a prefix-free code where the ithword has a codeword of length Ii, then the total cost of the encoding will be localid="1659078764835" fi·ci·li.

Show how to modify Huffman’s algorithm to find the prefix-free encoding of minimum total cost.

Short Answer

Expert verified

Combine the two trees with a short spectral region until you reach the root node. Then, with each character, verify the tree as well as traverse it to create the code word.

Step by step solution

01

Step 1: Minimum total cost of prefix-free encoding

Let's say we have a file containing "m" distinct terms and their accompanying frequency distributions. f1,f2,...fmand cost per bit for “ith” word is denoted as.

02

Step 2: Cost of prefix-free encoding

Huffman’s algorithm to find the cost of prefix-free encoding is given below:

cost=i=1mfili

Where,

f1” denotes as frequency distribution of the “ith” word.

Ii” denotes as length of the “ith” word.

The cost of prefix-free encoding can also be represented as,

cost=i=1mfili

Where,

fi.ci” denotes as frequency of the “ith” word.

Ii” denotes as length of the “ith” word.

03

Step 3: Minimum cost of prefix-free encoding

Here, Huffman developed a greedy algorithm to solve this problem and produces the minimum cost prefix code. The generated code is called as Huffman encoding.

Huffman encoding is an easy method.

• Each tree contains only one node, which corresponds to each letter.

• Join the pair of tree with the small range of frequency until to get the root node.

• Then, check the tree for each character and traverse the tree to write the code word.

o Here, “0” bit for left side traversal of tree.

o “1” bit for right side traversal of tree.

Note that the depth of the leaf in “ith” word in a tree “di” is equal to the length of the “ith” word “Ii” with that leaf.

Then, the minimum cost of prefix-free encoding can also be represented as,

cost=i=1mfidi

Where,

“localid="1658922807296" fi” denotes as frequency distribution of the“localid="1658922791600" ith” word.

“localid="1658922833155" di” denotes as depth of the“localid="1658922841716" ith” word.

Therefore, the Huffman encoding is considered as the minimum cost of prefix-free encoding.

For example:

Construction of the Huffman encoding tree:

Inputs are,

• Character A with frequency of 31%, character C with frequency of 20%, character G with frequency of 9%, and character T with frequency of 40%.

• Huffman encoding is shown below and frequencies are shown in square brackets.

From the Huffman encoding tree,

• Traverse the tree for character “A” from the root node is, 01 .

• Traverse the tree for character “C” from the root node is, 000 .

• Traverse the tree for character “G” from the root node is, 001 .

• Traverse the tree for character “T” from the root node is, 1 .

Hence, the Huffman encoding table is shown below:

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Show how to find the maximum spanning tree of a graph, that is , the spanning tree of largest total weight.

We use Huffman's algorithm to obtain an encoding of alphabet {a,b,c}with frequencies fa,fb,fc. In each of the following cases, either give an example of frequencies (fa,fb,fc)that would yield the specified code, or explain why the code cannot possibly be obtained (no matter what the frequencies are).

(a) Code:{0,10,11}

(b) Code:{0,1,00}

(c) Code:{10,01,00}

The following table gives the frequencies of the letters of the English language (including the blank for separating words) in a particular corpus.

blank

18.3%

r

4.8%

y

1.6%

e

10.2%

d

3.5%

p

1.6%

t

7.7%

l

3.4%

b

1.3%

a

6.8%

c

2.6%

v

0.9%

o

5.9%

u

2.4%

k

0.6%

i

5.8%

m

2.1%

j

0.2%

n

5.5%

w

1.9%

x

0.2%

s

5.1%

f

1.8%

q

0.1%

h

4.9%

g

1.7%

z

0.1%

  1. What is the optimum Huffman encoding of this alphabet?
  2. What is the expected number of bits per letter?
  3. Suppose now that we calculate the entropy of these frequencies

H=t=026ptlog1pt

(see the box in page 143). Would you expect it to be larger or smaller than your answer above? Explain.

d. Do you think that this is the limit of how much English text can be compressed? What features of the English language, besides letters and their frequencies, should a better compression scheme take into account?

A long string consists of the four characters A,C,G,T ; they appear with frequency 31%,20%,9%and40% respectively. What is the Huffman encoding of these four characters?

Show that for any integer n that is a power of 2 , there is an instance of the set cover problem (Section 5.4) with the following properties:

  1. There are n elements in the base set.
  2. The optimal cover uses just two sets.
  3. The greedy algorithm picks at least log n sets.

Thus the approximation ratio we derived in the chapter is tight.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free