Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient algorithm for finding a byte in a bit array

Given a bytearray uint8_t data[N] what is an efficient method to find a byte uint8_t search within it even if search is not octet aligned? i.e. the first three bits of search could be in data[i] and the next 5 bits in data[i+1].

My current method involves creating a bool get_bit(const uint8_t* src, struct internal_state* state) function (struct internal_state contains a mask that is bitshifted right, &ed with src and returned, maintaining size_t src_index < size_t src_len) , leftshifting the returned bits into a uint8_t my_register and comparing it with search every time, and using state->src_index and state->src_mask to get the position of the matched byte.

Is there a better method for this?

like image 490
user80551 Avatar asked May 11 '15 18:05

user80551


1 Answers

If you're searching an eight bit pattern within a large array you can implement a sliding window over 16 bit values to check if the searched pattern is part of the two bytes forming that 16 bit value.

To be portable you have to take care of endianness issues which is done by my implementation by building the 16 bit value to search for the pattern manually. The high byte is always the currently iterated byte and the low byte is the following byte. If you do a simple conversion like value = *((unsigned short *)pData) you will run into trouble on x86 processors...

Once value, cmp and mask are setup cmp and mask are shifted. If the pattern was not found within hi high byte the loop continues by checking the next byte as start byte.

Here is my implementation including some debug printouts (the function returns the bit position or -1 if pattern was not found):

int findPattern(unsigned char *data, int size, unsigned char pattern)
{
    int result = -1;
    unsigned char *pData;
    unsigned char *pEnd;
    unsigned short value;
    unsigned short mask;
    unsigned short cmp;
    int tmpResult;



    if ((data != NULL) && (size > 0))
    {
        pData = data;
        pEnd = data + size;

        while ((pData < pEnd) && (result == -1))
        {
            printf("\n\npData = {%02x, %02x, ...};\n", pData[0], pData[1]);

            if ((pData + 1) < pEnd)   /* still at least two bytes to check? */
            {
                tmpResult = (int)(pData - data) * 8;   /* calculate bit offset according to current byte */

                /* avoid endianness troubles by "manually" building value! */
                value = *pData << 8;
                pData++;
                value += *pData;

                /* create a sliding window to check if search patter is within value */
                cmp = pattern << 8;
                mask = 0xFF00;
                while (mask > 0x00FF)   /* the low byte is checked within next iteration! */
                {
                    printf("cmp = %04x, mask = %04x, tmpResult = %d\n", cmp, mask, tmpResult);

                    if ((value & mask) == cmp)
                    {
                        result = tmpResult;
                        break;
                    }

                    tmpResult++;   /* count bits! */
                    mask >>= 1;
                    cmp >>= 1;
                }
            }
            else
            {
                /* only one chance left if there is only one byte left to check! */
                if (*pData == pattern)
                {
                    result = (int)(pData - data) * 8;
                }

                pData++;
            }
        }
    }

    return (result);
}
like image 67
Lukas Thomsen Avatar answered Oct 15 '22 19:10

Lukas Thomsen