The standard Huffman encoding uses a binary target alphabet (0,1) (0,1).

Question

The standard Huffman encoding uses a binary target alphabet (0,1) (0,1). Assume that you’re given a text in alphabet Σ I and you compress one input symbol from the alphabet at a time.

    • Could you adapt the algorithm so that it utilizes three output symbols (0,1,2)?
    • What about k target symbols ( k < alphabet size)?
    • What would happen if you used Σ as the output alphabet for Huffman encoding? Would the text remain the same? Would it have the same length?
    • What changes if the input is words (i.e. sequences from the Σ alphabet with one symbol representing the word boundary)? What changes if you are allowed to use phrases (ie. ignore the word boundary) if it helps compression?

Summary

  1. If the number of unique letters in the input message is fewer than 3, using the alphabet size 3 is pointless. However, if the input contains more than three distinct letters, it can assist reduce the size of the encoded message.
  2. If the input size is smaller than the alphabet size, there is no point in using the ‘n’ sized alphabet because the length of the encoded message shrinks with the new alphabet.
  3. The size of each letter’s code shrinks.
  4. In most cases, phrases will be long inputs with a large number of letters.
  5. In that situation, the new alphabet would aid in the reduction of encoded data

Explanation

1). Generally the target alphabet in Huffman encoding is ≤ = {0,1}

If ∑= {0, 1, 2},

Still, it can be adapted to have a code for each letter.

It is possible to adapt if the size of the string (input) is greater than 3.

2). For ‘k’ target symbol (such that k< alphabet size), there is no of alphabet age = 5 e alphabet, S = {0,1,2,3,4} if input has only 3 unique letters. Then, rouse.

3). Output alphabet, & = 20,1,23

input = “Mississippi”                                                                                   Huffman tree

Character Frequency
m 1
s 4
p 2
i 4

Character Codelength Code
m 2 10
p 2 11
s 2 12
i 1 0

4). As explained above, input is a word and an encoding length with sigma size 3 will be the helping parts of compression in the alphabets.

 

Also, read Cloud Security Solutions and Services.

 

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *