Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Huffman suffix-code

I'm trying to efficiently construct a binary suffix code for a given set of characters with their probabilities (i.e. a set of words none of which is a suffix of any other).

My basic idea is to construct a prefix-code using an implementation of the Huffman algorithm. By reversing the code words I get a suffix-free code. While this solution is working, it might not seem optimal, because I have to reverse variable-length code words (thus I need a lookup table combined with bit-shifts).

Is there any way to modify the Huffman algorithm in order to create a suffix-code more efficiently?

like image 474
Tobias Geiselmann Avatar asked Feb 07 '17 09:02

Tobias Geiselmann


1 Answers

I would implement the HuffmanNode as

class HuffmanNode implements Comparable<HuffmanNode>
{
    // data
    private String text;
    private double frequency;

    // linkage
    private HuffmanNode left;
    private HuffmanNode right;
    private HuffmanNode parent;

    public HuffmanNode(String text, double frequency)
    {
        this.text = text;
        this.frequency = frequency;
    }
    public HuffmanNode(HuffmanNode n0, HuffmanNode n1)
    {
        if(n0.frequency < n1.frequency)
        {
            left = n0;
            right = n1;
        }else if(n0.frequency > n1.frequency)
        {
            left = n1;
            right = n0;
        }else
        {
            if(n0.text.compareTo(n1.text) < 0)
            {
                left = n0;
               right = n1;
            }else
            {
                left = n1;
                right = n0;
            }
        }
        left.parent = this;
        right.parent = this;
        text = left.text + right.text;
        frequency = left.frequency + right.frequency;
    }

    public HuffmanNode getParent() {
        return parent;
    }

    public HuffmanNode getLeft() {
       return left;
    }

    public HuffmanNode getRight() {
        return right;
    }

    public String getText()
    {
        return text;
    }

    @Override
    public int compareTo(HuffmanNode o) {
        if(frequency < o.frequency)
            return -1;
        else if(frequency > o.frequency)
            return 1;
        else
            return text.compareTo(o.text);
    }

    public Collection<HuffmanNode> leaves()
    {
        if(left == null && right == null)
        {
            Set<HuffmanNode> retval = new HashSet<>();
            retval.add(this);
            return retval;
        }
        else if(left == null || right == null)
        {
            Set<HuffmanNode> retval = new HashSet<>();
            if(left != null)
                retval.addAll(left.leaves());
            if(right != null)
                retval.addAll(right.leaves());
            retval.add(this);
            return retval;
        }
        else
        {
            Set<HuffmanNode> retval = new HashSet<>();
            retval.addAll(left.leaves());
            retval.addAll(right.leaves());
            return retval;
        }
    }

    public String toString()
    {
         return "{" + text + " -> " + frequency + "}";
    }
}

This class represents a single node in a Huffman tree.
It has convenience methods for getting all the leaves from a (sub)tree.

You can then easily build the tree:

private Map<String,String> buildTree(String text)
{
    List<HuffmanNode> nodes = new ArrayList<>();
    for(Map.Entry<String,Double> en : frequency(text).entrySet())
    {
        nodes.add(new HuffmanNode(en.getKey(), en.getValue()));
    }
    java.util.Collections.sort(nodes);
    while(nodes.size() != 1)
    {
        HuffmanNode n0 = nodes.get(0);
        HuffmanNode n1 = nodes.get(1);

        // build merged node
        HuffmanNode newNode = new HuffmanNode(nodes.get(0), nodes.get(1));
        nodes.remove(n0);
        nodes.remove(n1);

        // calculate insertion point
        int insertionPoint = - java.util.Collections.binarySearch(nodes, newNode) - 1;

        // insert
        nodes.add(insertionPoint, newNode);
    }

    // build lookup table
    Map<String, String> lookupTable = new HashMap<>();
    for(HuffmanNode leaf : nodes.iterator().next().leaves())
    {
        String code = "";
        HuffmanNode tmp = leaf;
        while(tmp.getParent() != null)
        {
            if(tmp.getParent().getLeft() == tmp)
                code = "0" + code;
            else
                code = "1" + code;
            tmp = tmp.getParent();
        }
        lookupTable.put(leaf.getText(), code);
    }
    return lookupTable;
}

By changing the method that builds the code (for instance pre-pending the next digit rather than appending it) you can change the codes being produced.

like image 154
Joris Schellekens Avatar answered Nov 10 '22 20:11

Joris Schellekens