I need to write effective and quick method to search byte array for given pattern. I write it this way, what do you think , how to improve? And it has one bug, it cannot return match with length 1.
public static bool SearchByteByByte(byte[] bytes, byte[] pattern)
{
bool found = false;
int matchedBytes = 0;
for (int i = 0; i < bytes.Length; i++)
{
if (pattern[0] == bytes[i] && bytes.Length - i >= pattern.Length)
{
for (int j = 1; j < pattern.Length; j++)
{
if (bytes[i + j] == pattern[j])
{
matchedBytes++;
if (matchedBytes == pattern.Length - 1)
{
return true;
}
continue;
}
else
{
matchedBytes = 0;
break;
}
}
}
}
return found;
}
Any suggestions ?
The Boyer-Moore algorithm that is used in grep is pretty efficient, and gets more efficient for longer pattern sizes. I'm pretty sure you could make it work for a byte array without too much difficulty, and its wikipedia page has an implementation in Java that should be fairly easy to port to C#.
UPDATE:
Here's an implementation of a simplified version of the Boyer-Moore algorithm for byte arrays in C#. It only uses the second jump table of the full algorithm. Based on the array sizes that you said (haystack: 2000000 bytes, needle: 10 bytes), it's about 5-8 times faster than a simple byte by byte algorithm.
static int SimpleBoyerMooreSearch(byte[] haystack, byte[] needle)
{
int[] lookup = new int[256];
for (int i = 0; i < lookup.Length; i++) { lookup[i] = needle.Length; }
for (int i = 0; i < needle.Length; i++)
{
lookup[needle[i]] = needle.Length - i - 1;
}
int index = needle.Length - 1;
var lastByte = needle.Last();
while (index < haystack.Length)
{
var checkByte = haystack[index];
if (haystack[index] == lastByte)
{
bool found = true;
for (int j = needle.Length - 2; j >= 0; j--)
{
if (haystack[index - needle.Length + j + 1] != needle[j])
{
found = false;
break;
}
}
if (found)
return index - needle.Length + 1;
else
index++;
}
else
{
index += lookup[checkByte];
}
}
return -1;
}
And it has one bug, it cannot return match with length 1
To fix this, start inner loop from zero:
public static bool SearchByteByByte(byte[] bytes, byte[] pattern)
{
bool found = false;
int matchedBytes = 0;
for (int i = 0; i < bytes.Length; i++)
{
if (pattern[0] == bytes[i] && bytes.Length - i >= pattern.Length)
{
for (int j = 0; j < pattern.Length; j++) // start from 0
{
if (bytes[i + j] == pattern[j])
{
matchedBytes++;
if (matchedBytes == pattern.Length) // remove - 1
return true;
continue;
}
else
{
matchedBytes = 0;
break;
}
}
}
}
return found;
}
UPDATE: Here is your searching algorithm after flattering and removing local variables (they are not needed)
public static bool SearchByteByByte(byte[] bytes, byte[] pattern)
{
for (int i = 0; i < bytes.Length; i++)
{
if (bytes.Length - i < pattern.Length)
return false;
if (pattern[0] != bytes[i])
continue;
for (int j = 0; j < pattern.Length; j++)
{
if (bytes[i + j] != pattern[j])
break;
if (j == pattern.Length - 1)
return true;
}
}
return false;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With