I'm fairly new to coding and I'm having difficulty creating a huffman algorithm to encode and decode text files. I understand most of the concepts pretty well but there's not much on exactly how you would create and traverse the tree.
Here's my code so far:
with open(input('enter a file: ')) as name:
fh = name.read()
print(fh)
#create the frequency dicitonary
freqdict = {}
for ch in fh:
if ch in freqdict:
freqdict[ch] += 1
else:
freqdict[ch] = 1
freqdict = sorted(freqdict.items(), key = lambda x:
x[1], reverse = True)
print(freqdict)
class Node:
def __init__(self, left = None, right = None,
data):
self.left = left
self.right = right
self.data = data
def children(self):
return (self.left, self.right)
def nodes(self):
return (self.left, self.right)
def __str__(self):
return str(self.left, self.right)
Modified version of : https://rosettacode.org/wiki/Huffman_coding#Python
This is a Huffman encoder/decoder for any message in 'txt'
This encodes the txt message into a shortened binary variable for storage ( You can store the compressed_binary to disk. You can also decode the compressed_binary using decompressHuffmanCode, which recreates the original string from the compressed string of compressed_binary
from heapq import heappush, heappop, heapify
from collections import defaultdict
from functools import reduce
def encode(symb2freq):
heap = [[wt, [sym, ""]] for sym, wt in symb2freq.items()]
heapify(heap)
while len(heap) > 1:
lo = heappop(heap)
hi = heappop(heap)
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
return dict(sorted(heappop(heap)[1:], key=lambda p: (p, len(p[-1]))))
# recreates the original message from your huffman code table
# uncomment print(a) to see how it works
def decompressHuffmanCode(a, bit):
# print(a)
return ('', a[1] + s[a[0]+bit[0]]) if (a[0]+bit[0] in s) else (a[0]+bit[0], a[1])
txt="CompresssionIsCoolWithHuffman"
# Create symbol to frequency table
symb2freq = defaultdict(int)
for ch in txt:
symb2freq[ch] += 1
enstr=encode(symb2freq)
# Create Huffman code table from frequency table
s=dict((v,k) for k,v in dict(enstr).items())
# Create compressible binary. We add 1 to the front, and remove it when read from disk
compressed_binary = '1' + ''.join([enstr[item] for item in txt])
# Read compressible binary so we can uncompress it. We strip the first bit.
read_compressed_binary = compressed_binary[1:]
# Recreate the compressed message from read_compressed_binary
remainder,bytestr = reduce(decompressHuffmanCode, read_compressed_binary, ('', ''))
print(bytestr)
which results in:
CompresssionIsCoolWithHuffman
This is a quick implementation that should help. Things that can be dealt for programmatically is the buffer, but i just wanted to show you a quick implementation using your frequency codes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With