Question: Suppose the symbols a,b,c,d,e occur with frequencies 12,14,18,116,116,respectively.

(a) What is the Huffman encoding of the alphabet?

(b) If this encoding is applied to a file consisting of1,000,1000 characters with the given frequencies, what is the length of the encoded file in bits?

Short Answer

Expert verified

Answers

  1. Huffman encoding of the symbols a,b,c,d,e, is0,10,110,1110,1111


    respectively.

2. The length of the encoded file in bits is 1875000 .

Step by step solution

01

Define Huffman Encoding 

Huffman Encoding is represented by a full binary tree. A binary tree in which every node has zero or two children is called a full binary tree. Alphabets are at the leaves, and each codeword is generated by a path from the root to leaf where left is represented as and right is represented as .

02

Determine Huffman encoding of given alphabets.

(a)

Given frequencies are sorted in increasing order.

After sorting, a full binary tree representation of the given frequencies is made as follows:

Huffman Encoding is done from the root node to each leaf node. So, codeword of is , codeword of is . Similarly, codewords of other alphabets are also found as shown in the table.

Alphabet

Codeword

a

0

b

10

c

110

d

1110

e

1111

Thus, Huffman encoding of the given alphabets are 0,10,110,1110,1111.

03

Calculate of length of encoded file and the number of bits 

(b)

Number of bits needed to encode is calculated by multiplying the number of characters in the file and the entropy of the distribution. A measure of how much randomness a distribution contains is called its Entropy.

Entropy=i=1npilog1pi

Here, is the number of possible outcomes, is the probability of each outcome.

Consider the formula for the number of bits.

Numberofbits,N=i=1nmpilog1pi

Here, is the number of characters in the file

N=i=1npilog1pi=m×i=15pilog1pi=m×12log2+14log4+18log8+116log16+116log16=1000000×12×1+14×2+18×3+116×4+12×4

N is further solved as:

N=1000000×12+12+38+14+14=1000000×32+38=1000000×158=1875000

Therefore, the length of encoded file in bits is 1875000.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Sometimes we want light spanning trees with certain special properties. Here’s an example.

Input: Undirected graph G=(V,E) ; edge weights we; subset of vertices UV

Output: The lightest spanning tree in which the nodes of U are leaves (there might be other leaves in this tree as well).

(The answer isn’t necessarily a minimum spanning tree.)

Give an algorithm for this problem which runs in O(ElogV) time. (Hint: When you remove nodes Ufrom the optimal solution, what is left?)

A long string consists of the four characters A,C,G,T ; they appear with frequency 31%,20%,9%and40% respectively. What is the Huffman encoding of these four characters?

We use Huffman's algorithm to obtain an encoding of alphabet {a,b,c}with frequencies fa,fb,fc. In each of the following cases, either give an example of frequencies (fa,fb,fc)that would yield the specified code, or explain why the code cannot possibly be obtained (no matter what the frequencies are).

(a) Code:{0,10,11}

(b) Code:{0,1,00}

(c) Code:{10,01,00}

Entropy: Consider a distribution overnpossible outcomes, with probabilities p1,p2,K,pn.

a. Just for this part of the problem, assume that each piis a power of 2 (that is, of the form 1/2k). Suppose a long sequence of msamples is drawn from the distribution and that for all 1in, the ithoutcome occurs exactly times in the sequence. Show that if Huffman encoding is applied to this sequence, the resulting encoding will have length

i-1nmpilog1pi

b. Now consider arbitrary distributions-that is, the probabilities pi are noy restricted to powers of 2. The most commonly used measure of the amount of randomness in the distribution is the entropy.

i-1nmpilog1pi

For what distribution (over outcomes) is the entropy the largest possible? The smallest possible?

Show that for any integer n that is a power of 2 , there is an instance of the set cover problem (Section 5.4) with the following properties:

  1. There are n elements in the base set.
  2. The optimal cover uses just two sets.
  3. The greedy algorithm picks at least log n sets.

Thus the approximation ratio we derived in the chapter is tight.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free