The following table gives the frequencies of the letters of the English language (including the blank for separating words) in a particular corpus.

blank

18.3%

r

4.8%

y

1.6%

e

10.2%

d

3.5%

p

1.6%

t

7.7%

l

3.4%

b

1.3%

a

6.8%

c

2.6%

v

0.9%

o

5.9%

u

2.4%

k

0.6%

i

5.8%

m

2.1%

j

0.2%

n

5.5%

w

1.9%

x

0.2%

s

5.1%

f

1.8%

q

0.1%

h

4.9%

g

1.7%

z

0.1%

  1. What is the optimum Huffman encoding of this alphabet?
  2. What is the expected number of bits per letter?
  3. Suppose now that we calculate the entropy of these frequencies

H=t=026ptlog1pt

(see the box in page 143). Would you expect it to be larger or smaller than your answer above? Explain.

d. Do you think that this is the limit of how much English text can be compressed? What features of the English language, besides letters and their frequencies, should a better compression scheme take into account?

Short Answer

Expert verified

In this question we can use different method to convert alphabet letter’s using binary bits pattern and getting answer.

Step by step solution

01

Compression Technique

a)

Huffman encoding is a data compression technique. Assume that the alphabet frequency is as shown in Figure 1. Determine the most efficient Huffman encoding for the alphabets.

Follow the methods outlined below to determine the best Huffman encoding:

• Arrange the alphabets in ascending order of frequency.

• Choose the two alphabets with the lowest frequency.

• Combine them and arrange the results into the frequency list.

• Repeat steps 1-3 until the entire list has been scanned.

Figure 1 depicts this procedure.

Figure 1

02

Explanation least frequent alphabets in parent node

• In, the alphabets z and q are used since they are the least common. These are combined, and the result is assigned to the parent node. Because the result is lower than all of the other wavelengths, it's also positioned before j in the list.

The fresh list will be: result so on.[z,q],x,j,k,v,........so on

• The least common alphabets in STEP 2 comprise result [z,q] and x . As a consequence, combine them and place the outcome inside the parent node. So, result[result [z,q],x ] has now become 0.4 , which in itself is bigger than j's value. Therefore, with in bandwidth list, put it after j. As a result, your new list will look like this:

j, result [result [z,q],x],k,v,...... So on.

• In the j is left node and result[result [z,q],x is right node as j is less than result[result[z,q],x ].

Continue this procedure on until entire list has been scanned.

• Give each left branch a number of 0 and each right branch a number of 1 . Figure 2 depicts the end outcome.

Figure 2:

Start somewhere at parent node and explore until you reach full alphabet, checking the 0s and 1s of the branches you've traversed.

The following are the results for any and all alphabets:

  • blank:101(3bits)
  • e:010 (3bits)
  • t:1000 (4bits)
  • a:1110 (4bits)
  • 0:1100(4bits)
  • i:0111(4bits)
  • n:0110(4bits)
  • s:0011(4bits)
  • h:0001(4bits)
  • r:0000(4bits)
  • d:11111(5 bits)
  • l :11110(5 bits)
  • c:00101(5 bits)
  • u:00100(5 bits)
  • m:100111(6 bits)
  • w:100101(6 bits)
  • f:100100(6 bits)
  • g:110111(6 bits)
  • y:110110(6 bits)
  • p:110101(6 bits)
  • b:110100(6 bits)
  • v:1001100(7 bits)
  • k:10011011(8bits)
  • j:100110100(9 bits)
  • x:1001101011(10 bits)
  • q:10011010101(11 bits)
  • z:10011010100(11 bits)

b)

Suppose the length of bits used for Huffman encoding is Iaand frequency of the letter is pa.

Sum of the frequencies is 101 . Expected number of bits per letter:

Expectednumberofbitsperletter=faIaaA

=1faa[18.3x3+10.2×3+7.7×4+6.8x4+5.9×4+5.8×4+5.5x4+5.1×4+4.9×4+4.8×4+3.5×5+3.4×5+2.6×5+2.4×5+2.1×6+1.9×6+1.8×6+1.7×6+1.6×6+1.6×6+1.3×6+0.9×7+0.6×8+0.2×9+0.2×10+0.1×11+0.1×11)

=1fa[(21.3+30.6+30.8+27.2+23.6+23.2+22+20.4+19.6+19.2+17.5+17+13+12+12.6+11.4+10.8+10.2+9.6+9.6+7.8+6.3+4.8+1.8+2.0+1.1+1.1)=386.5101

Assume, alphabet’s letter use to convert with number of bits base on = 3.83 bits per letter.

03

Conclusion 

In the above question there will be binary calculation conversion number or alphabets in bits per letter. This above calculation proved that bits per letter is can do with simple maths formula. It prove correct answer as above.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Suppose we want to find the minimum spanning tree of the following graph.

(a) Run Prim’s algorithm; whenever there is a choice of nodes, always use alphabetic ordering (e.g., start from node A). Draw a table showing the intermediate values of the cost array.

(b) Run Kruskal’s algorithm on the same graph. Show how the disjoint-sets data structure looks at every intermediate stage (including the structure of the directed trees), assuming path compression is used.

Entropy: Consider a distribution overnpossible outcomes, with probabilities p1,p2,K,pn.

a. Just for this part of the problem, assume that each piis a power of 2 (that is, of the form 1/2k). Suppose a long sequence of msamples is drawn from the distribution and that for all 1in, the ithoutcome occurs exactly times in the sequence. Show that if Huffman encoding is applied to this sequence, the resulting encoding will have length

i-1nmpilog1pi

b. Now consider arbitrary distributions-that is, the probabilities pi are noy restricted to powers of 2. The most commonly used measure of the amount of randomness in the distribution is the entropy.

i-1nmpilog1pi

For what distribution (over outcomes) is the entropy the largest possible? The smallest possible?

Ternary A server has customers waiting to be served. The service time required by eachcustomer is known in advance: it is ciminutes for customer i. So if, for example, the customers are served in order of increasing i , then the ithcustomer has to wait Pij=1tjminutes. We wish to minimize the total waiting time.

T=Xni=1(time spent waiting by customer ).

Give an efficient algorithm for computing the optimal order in which to process the customers.

Suppose you implement the disjoint-sets data structure usingunion-by-rank but not path compression. Give a sequence ofm union and find operations onnelements that take Ω(mlogn)time.

In this problem, we will develop a new algorithm for finding minimum spanning trees. It is based upon the following property:

Pick any cycle in the graph, and let e be the heaviest edge in that cycle. Then there is a minimum spanning tree that does not contain e.

(a) Prove this property carefully.

(b) Here is the new MST algorithm. The input is some undirected graph G=(V,E) (in adjacency list format) with edge weights {we}.sort the edges according to their weights for each edge eE, in decreasing order of we:

if e is part of a cycle of G:

G = G - e (that is, remove e from G )

return G , Prove that this algorithm is correct.

(c) On each iteration, the algorithm must check whether there is a cycle containing a specific edge . Give a linear-time algorithm for this task, and justify its correctness.

(d) What is the overall time taken by this algorithm, in terms of |E|? Explain your answer.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free