I'm trying to efficiently construct a binary suffix code for a given set of characters with their probabilities (i.e. a set of words none of which is a suffix of any other).
My basic idea is to construct a prefix-code using an implementation of the Huffman algorithm. By reversing the code words I get a suffix-free code. While this solution is working, it might not seem optimal, because I have to reverse variable-length code words (thus I need a lookup table combined with bit-shifts).
Is there any way to modify the Huffman algorithm in order to create a suffix-code more efficiently?
I would implement the HuffmanNode as
class HuffmanNode implements Comparable<HuffmanNode>
{
// data
private String text;
private double frequency;
// linkage
private HuffmanNode left;
private HuffmanNode right;
private HuffmanNode parent;
public HuffmanNode(String text, double frequency)
{
this.text = text;
this.frequency = frequency;
}
public HuffmanNode(HuffmanNode n0, HuffmanNode n1)
{
if(n0.frequency < n1.frequency)
{
left = n0;
right = n1;
}else if(n0.frequency > n1.frequency)
{
left = n1;
right = n0;
}else
{
if(n0.text.compareTo(n1.text) < 0)
{
left = n0;
right = n1;
}else
{
left = n1;
right = n0;
}
}
left.parent = this;
right.parent = this;
text = left.text + right.text;
frequency = left.frequency + right.frequency;
}
public HuffmanNode getParent() {
return parent;
}
public HuffmanNode getLeft() {
return left;
}
public HuffmanNode getRight() {
return right;
}
public String getText()
{
return text;
}
@Override
public int compareTo(HuffmanNode o) {
if(frequency < o.frequency)
return -1;
else if(frequency > o.frequency)
return 1;
else
return text.compareTo(o.text);
}
public Collection<HuffmanNode> leaves()
{
if(left == null && right == null)
{
Set<HuffmanNode> retval = new HashSet<>();
retval.add(this);
return retval;
}
else if(left == null || right == null)
{
Set<HuffmanNode> retval = new HashSet<>();
if(left != null)
retval.addAll(left.leaves());
if(right != null)
retval.addAll(right.leaves());
retval.add(this);
return retval;
}
else
{
Set<HuffmanNode> retval = new HashSet<>();
retval.addAll(left.leaves());
retval.addAll(right.leaves());
return retval;
}
}
public String toString()
{
return "{" + text + " -> " + frequency + "}";
}
}
This class represents a single node in a Huffman tree.
It has convenience methods for getting all the leaves from a (sub)tree.
You can then easily build the tree:
private Map<String,String> buildTree(String text)
{
List<HuffmanNode> nodes = new ArrayList<>();
for(Map.Entry<String,Double> en : frequency(text).entrySet())
{
nodes.add(new HuffmanNode(en.getKey(), en.getValue()));
}
java.util.Collections.sort(nodes);
while(nodes.size() != 1)
{
HuffmanNode n0 = nodes.get(0);
HuffmanNode n1 = nodes.get(1);
// build merged node
HuffmanNode newNode = new HuffmanNode(nodes.get(0), nodes.get(1));
nodes.remove(n0);
nodes.remove(n1);
// calculate insertion point
int insertionPoint = - java.util.Collections.binarySearch(nodes, newNode) - 1;
// insert
nodes.add(insertionPoint, newNode);
}
// build lookup table
Map<String, String> lookupTable = new HashMap<>();
for(HuffmanNode leaf : nodes.iterator().next().leaves())
{
String code = "";
HuffmanNode tmp = leaf;
while(tmp.getParent() != null)
{
if(tmp.getParent().getLeft() == tmp)
code = "0" + code;
else
code = "1" + code;
tmp = tmp.getParent();
}
lookupTable.put(leaf.getText(), code);
}
return lookupTable;
}
By changing the method that builds the code (for instance pre-pending the next digit rather than appending it) you can change the codes being produced.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With