5/5 - (1 vote)

Huffman Codes

Name: [Solved] CSE100 Algorithm Design and Analysis Lab 14,
Brand: Assignment Chef
SKU: [Solved] CSE100 Algorithm Design and Analysis Lab 14,
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

Suppose that we have to store a sequence of symbols (a file) efficiently, namely we want to minimize the amount of memory needed. For the sake of simplicity we assume that the symbols are restricted to the first 6 letters of the alphabet. For example, let us assume that the frequency of different symbols that you have to store are the following:

symbol frequency

A 1000 B 150 C 200 D 800 E 300

F 50 Total 2500

As we have to store 6 different symbols, the obvious way is to encode each of them in 3 bits, as with 3 bits it is possible to encode 2³different symbols. With this encoding, we need 25003 = 7500 bits to store the above symbols.

A different strategy to address the problem is the following. Instead of assigning to each symbol a code with the same length (i.e., number of bits), we assign shorter codes to symbols that are more frequent, and longer codes to symbols that are less frequent. One possible encoding according to this sequence is the following:

symbol encoding A 0 B 10101 C 1011 D 11 E 100 F 10100

According to this encoding the number of required bits is:

1000 1 + 150 5 + 200 4 + 800 2 + 300 3 + 50 5 = 5300

This idea is at the basis of the programs used to compress files. First, they analyze the input, then they choose the codes, and then they recode the input according to the determined codes.

While this idea brings benefits in terms of the space requirements, using variable length codes presents some problems. Once we have coded a file according to a variable length code, we must also be able to decode it in the original format (i.e., once we have compressed the file, we want to be able to decompress it). The encoding works only if the codes assigned to different characters are such that no code is a prefix of any other code. If this property does not hold, there is a problem of ambiguity when trying to decompress the sequence.

You can prove that in the depicted example no code is a prefix of any other code. For example: no code starts with 0 except for the code of A. So, while decompressing the file, if we find a

symbol whose code starts with 0, we know it is A. If we find a character whose code starts with 11, we know it is D. It cannot be any other symbol, as no code starts with 11 other than D. And so on. How do we assign codes? This is done through a greedy algorithm. We assign the shortest code to the most frequent character, the second longest one to the second most frequent character, and so on. The figure below illustrates the first few stages of the algorithm.

A: 1000

B: 150

C: 200

D: 800

E: 300

F: 50

Given N characters with their respective frequencies, the algorithm initially builds N trees, each one consisting just of a single node (step 1, in the figure). Then, iteratively, it joins together the trees whose roots have the lowest frequencies (steps 2, 3, etc. in the figure). The tree with the lowest root frequency becomes the left child and the tree with the second-lowest root frequency becomes the right child. Left children are associated with the bit 0, right children with the bit 1. Internal nodes (i.e., root nodes created) can be thought of as dummy nodes storing a fictitious character (which does not appear in our sequence). This procedure is iterated until there is just one tree. At this point, in order to know the code associated with one symbol you simply need to concatenate the 0s and 1s you encounter while moving from the root down to the symbol.

Note that the greedy strategy is applied in the reverse way. Symbols with low frequencies end up down in the tree (i.e., they are associated with long codes), while nodes with high frequencies are near the root (i.e., they are assigned short codes).

Input structure On the first line in the input is the number of characters N in the alphabet.

Each of the following N lines contains the frequency of the i^thsymbol, one per line.

Output structure The output contains N lines. Each line prints the Huffman code corresponding to the i^thsymbol in the input.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSE100 Algorithm Design and Analysis Lab 14,

Huffman Codes

symbol frequency

F 50 Total 2500

symbol encoding A 0 B 10101 C 1011 D 11 E 100 F 10100

1000 1 + 150 5 + 200 4 + 800 2 + 300 3 + 50 5 = 5300

Reviews

Related products

[Solved] CSE100 Lab 4 Watermelon Program

[Solved] CSE100 Algorithm Design and Analysis Lab 15

[Solved] CSE100 Algorithm Design and Analysis Lab #01

[Solved] CSE100 Algorithm Design and Analysis Lab 07,

[Solved] CSE100 Lab 8-Matrix Chain Multiplication

[Solved] CSE100 Lab 4b-Quick-Sort