I have a byte array and wish to find the "occurrences" of some bytes.
For example 00 69 73 6F 6D
in a very large byte array (> 50/100 Megabytes)
OR
even better a reverse operation: Searching the most common pattern without knowing it the code should be able to read and find it from the file.
We can also get the byte array using the below code. byte[] byteArr = str. getBytes("UTF-8");
A byte sequence is a mutable data structure that contains a fixed-length sequence of bytes. Each byte can be indexed in constant time for reading or writing. Given a byte sequence s of length l , we can access each of the l bytes of s via its index in the sequence.
A byte is 8 bits (binary data). A byte array is an array of bytes (tautology FTW!). You could use a byte array to store a collection of binary data, for example, the contents of a file. The downside to this is that the entire file contents must be loaded into memory.
Convert byte[] to String (text data) toString() to get the string from the bytes; The bytes. toString() only returns the address of the object in memory, NOT converting byte[] to a string ! The correct way to convert byte[] to string is new String(bytes, StandardCharsets. UTF_8) .
You can use the Boyer-Moore algorithm to efficiently search for a sequence of bytes in an array of bytes.
Here's a C# version I converted from the Java version from the Wikipedia entry on Boyer-Moore.
public sealed class BoyerMoore
{
readonly byte[] needle;
readonly int[] charTable;
readonly int[] offsetTable;
public BoyerMoore(byte[] needle)
{
this.needle = needle;
this.charTable = makeByteTable(needle);
this.offsetTable = makeOffsetTable(needle);
}
public IEnumerable<int> Search(byte[] haystack)
{
if (needle.Length == 0)
yield break;
for (int i = needle.Length - 1; i < haystack.Length;)
{
int j;
for (j = needle.Length - 1; needle[j] == haystack[i]; --i, --j)
{
if (j != 0)
continue;
yield return i;
i += needle.Length - 1;
break;
}
i += Math.Max(offsetTable[needle.Length - 1 - j], charTable[haystack[i]]);
}
}
static int[] makeByteTable(byte[] needle)
{
const int ALPHABET_SIZE = 256;
int[] table = new int[ALPHABET_SIZE];
for (int i = 0; i < table.Length; ++i)
table[i] = needle.Length;
for (int i = 0; i < needle.Length - 1; ++i)
table[needle[i]] = needle.Length - 1 - i;
return table;
}
static int[] makeOffsetTable(byte[] needle)
{
int[] table = new int[needle.Length];
int lastPrefixPosition = needle.Length;
for (int i = needle.Length - 1; i >= 0; --i)
{
if (isPrefix(needle, i + 1))
lastPrefixPosition = i + 1;
table[needle.Length - 1 - i] = lastPrefixPosition - i + needle.Length - 1;
}
for (int i = 0; i < needle.Length - 1; ++i)
{
int slen = suffixLength(needle, i);
table[slen] = needle.Length - 1 - i + slen;
}
return table;
}
static bool isPrefix(byte[] needle, int p)
{
for (int i = p, j = 0; i < needle.Length; ++i, ++j)
if (needle[i] != needle[j])
return false;
return true;
}
static int suffixLength(byte[] needle, int p)
{
int len = 0;
for (int i = p, j = needle.Length - 1; i >= 0 && needle[i] == needle[j]; --i, --j)
++len;
return len;
}
}
Here's some console app test code for it:
public static void Main()
{
byte[] haystack = new byte[10000];
byte[] needle = { 0x00, 0x69, 0x73, 0x6F, 0x6D };
// Put a few copies of the needle into the haystack.
for (int i = 1000; i <= 9000; i += 1000)
Array.Copy(needle, 0, haystack, i, needle.Length);
var searcher = new BoyerMoore(needle);
foreach (int index in searcher.Search(haystack))
Console.WriteLine(index);
}
Note how the Search()
method returns the indices of all the locations of the start of needle
inside haystack
.
If you just wanted the count, you could just do:
int count = new BoyerMoore(needle).Search(haystack).Count();
For your second question: I assume you are asking about finding the longest repeated sequence of bytes?
That's a much more complicated - and very different - question. If you want an answer for that, you should ask a separate question for it, but you should read the Wikipedia entry on the "longest repeated substring problem".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With