I've got a very long array of bytes, for example:
Byte[] bytes = {90, 80, 63, 65, 70 ...};
It's nearly 20-30 Mb (theoretically). Is there a fast way to check if this array contains another array, for example:
Byte[] small = {63, 80, 75, 77};
First, I need find bytes in that order which they was defined in small array. Second, I need find array in another array not any byte of small array. Thanks all to advance.
Simple Approach: A simple approach is to run two nested loops and generate all subarrays of the array A[] and use one more loop to check if any of the subarray of A[] is equal to the array B[]. Efficient Approach : An efficient approach is to use two pointers to traverse both the array simultaneously.
equals(byte[] a, byte[] a2) method returns true if the two specified arrays of bytes are equal to one another. Two arrays are equal if they contain the same elements in the same order. Two array references are considered equal if both are null.
The bytearray() method returns a bytearray object, which is an array of bytes. It returns a mutable series of integers between 0 and 256. The source parameter of the ByteArray is used to initialize the array.
For performance you'll want something like the Boyer-Moore string search algorithm. While it's designed for strings, it should work just as well on byte arrays, and is much more performant than a brute-force solution.
The Wikipedia article provides several implementations, including one in Java and one in C, so creating a C# implementation should be fairly painless.
As it turns out, translating the Wikipedia article's Java implementation of Boyer-Moore to C# (and char
to byte
) was painless indeed. Here it is:
public static class BoyerMoore
{
public static int IndexOf(byte[] haystack, byte[] needle)
{
if (needle.Length == 0)
{
return 0;
}
int[] charTable = MakeCharTable(needle);
int[] offsetTable = MakeOffsetTable(needle);
for (int i = needle.Length - 1; i < haystack.Length;)
{
int j;
for (j = needle.Length - 1; needle[j] == haystack[i]; --i, --j)
{
if (j == 0)
{
return i;
}
}
i += Math.Max(offsetTable[needle.Length - 1 - j], charTable[haystack[i]]);
}
return -1;
}
private static int[] MakeCharTable(byte[] needle)
{
const int ALPHABET_SIZE = 256;
int[] table = new int[ALPHABET_SIZE];
for (int i = 0; i < table.Length; ++i)
{
table[i] = needle.Length;
}
for (int i = 0; i < needle.Length - 1; ++i)
{
table[needle[i]] = needle.Length - 1 - i;
}
return table;
}
private static int[] MakeOffsetTable(byte[] needle)
{
int[] table = new int[needle.Length];
int lastPrefixPosition = needle.Length;
for (int i = needle.Length - 1; i >= 0; --i)
{
if (IsPrefix(needle, i + 1))
{
lastPrefixPosition = i + 1;
}
table[needle.Length - 1 - i] = lastPrefixPosition - i + needle.Length - 1;
}
for (int i = 0; i < needle.Length - 1; ++i)
{
int slen = SuffixLength(needle, i);
table[slen] = needle.Length - 1 - i + slen;
}
return table;
}
private static bool IsPrefix(byte[] needle, int p)
{
for (int i = p, j = 0; i < needle.Length; ++i, ++j)
{
if (needle[i] != needle[j])
{
return false;
}
}
return true;
}
private static int SuffixLength(byte[] needle, int p)
{
int len = 0;
for (int i = p, j = needle.Length - 1; i >= 0 && needle[i] == needle[j]; --i, --j)
{
len += 1;
}
return len;
}
}
The algorithm spends a linear bit of time at the start, building its tables; from then it's blazingly fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With