Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if bytes array contains another array

I've got a very long array of bytes, for example:

Byte[] bytes = {90, 80, 63, 65, 70 ...};

It's nearly 20-30 Mb (theoretically). Is there a fast way to check if this array contains another array, for example:

Byte[] small = {63, 80, 75, 77};

First, I need find bytes in that order which they was defined in small array. Second, I need find array in another array not any byte of small array. Thanks all to advance.

like image 811
kate Avatar asked Jan 26 '16 07:01

kate


People also ask

How to check if an array is present in another array?

Simple Approach: A simple approach is to run two nested loops and generate all subarrays of the array A[] and use one more loop to check if any of the subarray of A[] is equal to the array B[]. Efficient Approach : An efficient approach is to use two pointers to traverse both the array simultaneously.

How do you know if two byte arrays are equal?

equals(byte[] a, byte[] a2) method returns true if the two specified arrays of bytes are equal to one another. Two arrays are equal if they contain the same elements in the same order. Two array references are considered equal if both are null.

What is byte array?

The bytearray() method returns a bytearray object, which is an array of bytes. It returns a mutable series of integers between 0 and 256. The source parameter of the ByteArray is used to initialize the array.


1 Answers

For performance you'll want something like the Boyer-Moore string search algorithm. While it's designed for strings, it should work just as well on byte arrays, and is much more performant than a brute-force solution.

The Wikipedia article provides several implementations, including one in Java and one in C, so creating a C# implementation should be fairly painless.


As it turns out, translating the Wikipedia article's Java implementation of Boyer-Moore to C# (and char to byte) was painless indeed. Here it is:

public static class BoyerMoore
{
    public static int IndexOf(byte[] haystack, byte[] needle)
    {
        if (needle.Length == 0)
        {
            return 0;
        }

        int[] charTable = MakeCharTable(needle);
        int[] offsetTable = MakeOffsetTable(needle);
        for (int i = needle.Length - 1; i < haystack.Length;)
        {
            int j;
            for (j = needle.Length - 1; needle[j] == haystack[i]; --i, --j)
            {
                if (j == 0)
                {
                    return i;
                }
            }

            i += Math.Max(offsetTable[needle.Length - 1 - j], charTable[haystack[i]]);
        }

        return -1;
    }

    private static int[] MakeCharTable(byte[] needle)
    {
        const int ALPHABET_SIZE = 256;
        int[] table = new int[ALPHABET_SIZE];
        for (int i = 0; i < table.Length; ++i)
        {
            table[i] = needle.Length;
        }

        for (int i = 0; i < needle.Length - 1; ++i)
        {
            table[needle[i]] = needle.Length - 1 - i;
        }

        return table;
    }

    private static int[] MakeOffsetTable(byte[] needle)
    {
        int[] table = new int[needle.Length];
        int lastPrefixPosition = needle.Length;
        for (int i = needle.Length - 1; i >= 0; --i)
        {
            if (IsPrefix(needle, i + 1))
            {
                lastPrefixPosition = i + 1;
            }

            table[needle.Length - 1 - i] = lastPrefixPosition - i + needle.Length - 1;
        }

        for (int i = 0; i < needle.Length - 1; ++i)
        {
            int slen = SuffixLength(needle, i);
            table[slen] = needle.Length - 1 - i + slen;
        }

        return table;
    }

    private static bool IsPrefix(byte[] needle, int p)
    {
        for (int i = p, j = 0; i < needle.Length; ++i, ++j)
        {
            if (needle[i] != needle[j])
            {
                return false;
            }
        }

        return true;
    }

    private static int SuffixLength(byte[] needle, int p)
    {
        int len = 0;
        for (int i = p, j = needle.Length - 1; i >= 0 && needle[i] == needle[j]; --i, --j)
        {
            len += 1;
        }

        return len;
    }
}

The algorithm spends a linear bit of time at the start, building its tables; from then it's blazingly fast.

like image 74
Petter Hesselberg Avatar answered Oct 06 '22 00:10

Petter Hesselberg