Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Variable length integer encoding

I am attempting to reverse engineer an LZ1/LZ77 decompression algorithm. The length of an area of the decode buffer/window to be output is encoded in the file as a variable length integer. I've read as much as I can about variable length integer encoding and the method being used in this case does not appear to be like any others I have seen. Perhaps to avoid patent issues or maybe just to obfuscate. The included code might not be quite complete but it is working on at least several files at this point.

I cannot see how, if at all, the formulas being used below could be reduced into something simpler. Most of the variable length integer encoding algorithms use some sort of loop but for this one, I have not been able to do that because the formula just doesn't seem to be consistent when evaluating each nibble.

Suggestions are greatly appreciated.

private static int getLength(BitReader bitStream)
{
    const int minSize = 2;

    int length = 0;

    byte nibble3, nibble2, nibble1;

    nibble3 = bitStream.ReadNibble();

    if (nibble3 >= 0xc)
    {
        nibble2 = bitStream.ReadNibble();
        nibble1 = bitStream.ReadNibble();

        if (nibble3 == 0xF & nibble2 == 0xF & nibble1 == 0xF) return -1;

        if ((nibble3 & 2) != 0)
        {
            length = (((((nibble3 & 7) + 3) << 6) + 8)) + 
                ((nibble2 & 7) << 3) + nibble1 + minSize;
        }
        else if ((nibble3 & 1) != 0)
        {
            length = (((nibble3 & 7) << 6) + 8) + 
                ((((nibble2 & 7)) + 1) << 3) + nibble1 + minSize;
        }
        else
        {
            length = ((((nibble3 & 7) << 4) + 8)) + 
                ((nibble2 & 7) << 4) + nibble1 + minSize;
        }
    }
    else if ((nibble3 & 8) != 0)
    {
        nibble1 = bitStream.ReadNibble();

        length = ((((nibble3 & 7) << 1) + 1) << 3) + nibble1 + minSize;
    }
    else
    {
        length = nibble3 + minSize;
    }

    return length;
}
like image 588
Richard Collette Avatar asked Mar 01 '10 06:03

Richard Collette


1 Answers

It turns out that the variable length integer encoding algorithm being used is very similar to the Dlugosz' Variable-Length Integer Encoding method. Indeed, there are multiple calculations that are required, rather than a single formula.

Based on that, I re-wrote the code as follows. I am still trying to figure out the exact format of the mechanism where a leading 0xFFF is used.

    private static int getLength(BitReader bitStream)
    {
        const int minSize = 2;
        int length = 0;
        byte nibble3, nibble2, nibble1;
        byte nibble;
        nibble = bitStream.ReadNibble();
        if (nibble == 0xF)
        {
            nibble2 = bitStream.ReadNibble();
            nibble1 = bitStream.ReadNibble();
            if (nibble2 == 0xf && nibble1 == 0xF)
            {
                //The next nibble specifies the number of nibbles to be read, maybe.
                byte nibblesToRead = (byte) (bitStream.ReadNibble()) ;
                //The Dlugosz' mechanism would use a mask on the value but that doesn't appear to be the case here.
                //nibblesToRead &= 7;
                //switch (nibblesToRead & 7){
                //    case 0: nibblesToRead = 5; break;
                //    case 1: nibblesToRead = 8; break;
                //    case 2: nibblesToRead = 16; break;                           
                //}
                byte value=0;
                byte[] values = new byte[nibblesToRead];
                bool c=true;
                for (int i = 0; i < nibblesToRead; i++)
                {
                    value = bitStream.ReadNibble();
                    //values[i] = value;
                    length += (((value << 1) | 1) << 3);
                }
                value = bitStream.ReadNibble();
                length += value;
            }
        }
        else if((nibble >= 0xC)){
           nibble2 = bitStream.ReadNibble();
           nibble1 = bitStream.ReadNibble();
           length = ((((((nibble & 1) <<1)|1))<< 3) + ((nibble2<<1)|1)<<3)+nibble1;
        }
        else if ((nibble & 8)!=0){
            nibble1 = bitStream.ReadNibble();
            length = ((((nibble & 3)<<1) | 1) << 3) + nibble1;
        }
        else{
            length=nibble;
        }
        return length + minSize;
      };
like image 118
Richard Collette Avatar answered Nov 01 '22 23:11

Richard Collette