Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a binary tree from a frequency dictionary

I'm fairly new to coding and I'm having difficulty creating a huffman algorithm to encode and decode text files. I understand most of the concepts pretty well but there's not much on exactly how you would create and traverse the tree.

Here's my code so far:

with open(input('enter a file: ')) as name:
    fh = name.read()
    print(fh)

#create the frequency dicitonary
freqdict = {}
for ch in fh:
    if ch in freqdict:
        freqdict[ch] += 1
    else:
        freqdict[ch] = 1
freqdict = sorted(freqdict.items(), key = lambda x: 
x[1], reverse = True)
print(freqdict)

class Node:
    def __init__(self, left = None, right = None, 
data):
        self.left = left
        self.right = right
        self.data = data

    def children(self):
        return (self.left, self.right)

    def nodes(self):
        return (self.left, self.right)

    def __str__(self):
        return str(self.left, self.right)

1 Answers

Modified version of : https://rosettacode.org/wiki/Huffman_coding#Python

This is a Huffman encoder/decoder for any message in 'txt'

This encodes the txt message into a shortened binary variable for storage ( You can store the compressed_binary to disk. You can also decode the compressed_binary using decompressHuffmanCode, which recreates the original string from the compressed string of compressed_binary

from heapq import heappush, heappop, heapify
from collections import defaultdict
from functools import reduce

def encode(symb2freq):
    heap = [[wt, [sym, ""]] for sym, wt in symb2freq.items()]
    heapify(heap)
    while len(heap) > 1:
        lo = heappop(heap)
        hi = heappop(heap)
        for pair in lo[1:]:
            pair[1] = '0' + pair[1]
        for pair in hi[1:]:
            pair[1] = '1' + pair[1]
        heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
    return dict(sorted(heappop(heap)[1:], key=lambda p: (p, len(p[-1]))))

# recreates the original message from your huffman code table 
# uncomment print(a) to see how it works
def decompressHuffmanCode(a, bit):
    # print(a)
    return ('', a[1] + s[a[0]+bit[0]]) if (a[0]+bit[0] in s) else (a[0]+bit[0], a[1])

txt="CompresssionIsCoolWithHuffman"

# Create symbol to frequency table
symb2freq = defaultdict(int)
for ch in txt:
    symb2freq[ch] += 1
enstr=encode(symb2freq)

# Create Huffman code table from frequency table
s=dict((v,k) for k,v in dict(enstr).items())

# Create compressible binary. We add 1 to the front, and remove it when read from disk
compressed_binary = '1' + ''.join([enstr[item] for item in txt])

# Read compressible binary so we can uncompress it. We strip the first bit.
read_compressed_binary = compressed_binary[1:]

# Recreate the compressed message from read_compressed_binary
remainder,bytestr = reduce(decompressHuffmanCode, read_compressed_binary, ('', ''))
print(bytestr)

which results in:

CompresssionIsCoolWithHuffman

This is a quick implementation that should help. Things that can be dealt for programmatically is the buffer, but i just wanted to show you a quick implementation using your frequency codes

like image 118
oppressionslayer Avatar answered May 31 '26 16:05

oppressionslayer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!