Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How best to search binary data for variable length bit strings?

Can anyone tell me the best way to decode binary data with variable length bit strings in java?

For example:

The binary data is 10101000 11100010 01100001 01010111 01110001 01010110

I might need to find the first match of any of the following 01, 100, 110, 1110, 1010...

In this case the match would be 1010. I then need to do the same for the remainder of the binary data. The bit strings can be up to 16 bits long and cross the byte boundaries.

Basically, I'm trying to Huffman decode jpegs using the bit strings I created from the Huffman tables in the headers. I can do it, only it's very messy, I'm turning everything, binary data included, into Stringbuffers first and I know that isn't the right way.

Before I loaded everything in string buffers I tried using just numbers in binary but of course I can't ignore the leading 0s in a code like 00011. I'm sure there must be some clever way using bit wise operators and the like to do this, but I've been staring at pages explaining bit masks and leftwise shifts etc and I still don't have a clue!

Thanks a lot for any help!

EDIT:

Thanks for all the suggestions. I've gone with the binary tree approach as it seems to be the standard way with Huffman stuff. Makes sense really as Huffman codes are created using trees. I'll also look into to storing the binary data I need to search in a big integer. Don't know how to mark multiple answers as correct, but thanks all the same.

like image 876
joinJpegs Avatar asked Jan 24 '23 18:01

joinJpegs


2 Answers

You might use a state machine consuming zeros and ones. The state machine would have final states for all the patterns that you want to detect. Whenever it enters one of the final states, is sends a message to you with the matched pattern and goes back to the initial state.

Finally you would have only one state machine in form of a DAG which contains all your patterns.

To implement it use the state pattern (http://en.wikipedia.org/wiki/State_pattern) or any other implementation of a state machine.

like image 129
paweloque Avatar answered Jan 30 '23 01:01

paweloque


Since you are decoding Huffman encoded-data, you should create a binary tree, where leaves hold the decoded bit string as data, and the bits of each Huffman code are the path to the corresponding data. The bits of the Huffman code are accessed with bit-shift and bit-mask operations. When you get to a leaf, you output the data at that leaf and go back to the root of the tree. It's very fast and efficient.

like image 31
erickson Avatar answered Jan 30 '23 00:01

erickson