How to write Huffman coding to a file using Python?

Tags:

I created a Python script to compress text by using the Huffman algorithm. Say I have the following string:

string = 'The quick brown fox jumps over the lazy dog'

Running my algorithm returns the following 'bits':

result = '01111100111010101111010011111010000000011000111000010111110111110010100110010011010100101111100011110001000110101100111101000010101101110110111000111010101110010111111110011000101101000110111000'

By comparing the amount of bits of the result with the input string, the algorithm seems to work:

>>> print len(result), len(string) * 8
194 344

But now comes the question: how do I write this to a file, while still being able to decode it. You can only write to a file per byte, not per bit. By writing the 'codes' as bytes, there is no compression at all!

I am new at computer science, and the online resources just don't cut it for me. All help is much appreciated!

Edit: note that I had my codes something like this (in case of another input string 'xxxxxxxyzz'):

{'y': '00', 'x': '1', 'z': '10'}

The way I create the resulting string is by concatenating these codes in order of the input string:

result = '1111111001010'

How to get back to the original string from this result? Or am I getting this completely wrong? Thank you!

354

asked Jul 19 '18 14:07

Pim Klaassen

1 Answers

First you need to convert your input string to bytes:

def _to_Bytes(data):
  b = bytearray()
  for i in range(0, len(data), 8):
    b.append(int(data[i:i+8], 2))
  return bytes(b)

Then, open a file to write in binary mode:

result = '01111100111010101111010011111010000000011000111000010111110111110010100110010011010100101111100011110001000110101100111101000010101101110110111000111010101110010111111110011000101101000110111000'
with open('test.bin', 'wb') as f:
  f.write(_to_Bytes(result))

Now, writing the original string to a file, a comparison of bytes can take place:

import os
with open('test_compare.txt', 'a') as f:
  f.write('The quick brown fox jumps over the lazy dog')

_o = os.path.getsize('test_compare.txt')
_c = os.path.getsize('test.bin')
print(f'Original file: {_o} bytes')
print(f'Compressed file: {_c} bytes')
print('Compressed file to about {}% of original'.format(round((((_o-_c)/_o)*100), 0)))

Output:

Original file: 43 bytes
Compressed file: 25 bytes
Compressed file to about 42.0% of original

To get back to the original, you can write a function that determines the possible ordering of characters:

d = {'y': '00', 'x': '1', 'z': '10'}
result = '1111111001010'
from typing import Generator
def reverse_encoding(content:str, _lookup) -> Generator[str, None, None]:
  while content:
    _options = [i for i in _lookup if content.startswith(i) and (any(content[len(i):].startswith(b) for b in _lookup) or not content[len(i):])]
    if not _options:
      raise Exception("Decoding error")
    yield _lookup[_options[0]]
    content = content[len(_options[0]):]

print(''.join(reverse_encoding(result, {b:a for a, b in d.items()})))

Output:

'xxxxxxxyzz'

130

answered Sep 29 '22 00:09

Ajax1234

Related questions
                            
                                Python argparse default values not working
                            
                                Duplicating a list based on elements in its list of lists
                            
                                Color axis labels for pandas plotting on dark theme notebook
                            
                                Differences in three ways to define a abstract class
                            
                                Format Excel Column header for better visibility and Color
                            
                                Argparse in Jupyter Notebook throws a TypeError
                            
                                what is difference between .map(str) and .astype(str) in dataframe
                            
                                How to reuse tensorflow variables in eager execution mode?
                            
                                Pandas left join DataFrames by two columns
                            
                                OpenCV: How to correctly apply morphologyEx operation?
                            
                                Crystal-lang Accessing Serial port
                            
                                How to find parent tags of an element with BeautifulSoup?
                            
                                how to get frames with timestamp where frames are converted from video in python
                            
                                Fastest way to read huge MySQL table in python
                            
                                Pydub unable to locate ffprobe
                            
                                latex jupyter notebook it does not work
                            
                                I have to check if the string contains: alphanumeric, alphabetical , digits, lowercase and uppercase characters
                            
                                drops a column if it exceeds a specific number of NA values
                            
                                Why does {1} == frozenset({1}) in Python?
                            
                                Why does tf.Print() not work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to write Huffman coding to a file using Python?

Tags:

python

encoding

compression

huffman-code

Pim Klaassen

People also ask

1 Answers

Ajax1234

Recent Activity

Donate For Us