Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to locate a sequence of values (specifically, bytes) within a larger collection in .NET

I need to parse the bytes from a file so that I only take the data after a certain sequence of bytes has been identified. For example, if the sequence is simply 0xFF (one byte), then I can use LINQ on the collection:

byte[] allBytes = new byte[] {0x00, 0xFF, 0x01};
var importantBytes = allBytes.SkipWhile(byte b => b != 0xFF);
// importantBytes = {0xFF, 0x01}

But is there an elegant way to detect a multi-byte sequence - e.g. 0xFF, 0xFF - especially one that backtracks in case it starts to get a false positive match?

like image 682
Pat Avatar asked Nov 05 '22 16:11

Pat


2 Answers

I'm not aware of any built-in way; as per usual, you can always write your own extension method. Here's one off the top of my head (there may be more efficient ways to implement it):

public static IEnumerable<T> AfterSequence<T>(this IEnumerable<T> source,
    T[] sequence)
{
    bool sequenceFound = false;
    Queue<T> currentSequence = new Queue<T>(sequence.Length);
    foreach (T item in source)
    {
        if (sequenceFound)
        {
            yield return item;
        }
        else
        {
            currentSequence.Enqueue(item);

            if (currentSequence.Count < sequence.Length)
                continue;

            if (currentSequence.Count > sequence.Length)
                currentSequence.Dequeue();

            if (currentSequence.SequenceEqual(sequence))
                sequenceFound = true;
        }
    }
}

I'll have to check to make sure that this is correct, but it should give you the basic idea; iterate through the elements, track the last sequence of values retrieved, set a flag when the sequence is found, and once the flag is set, start returning each subsequent element.

Edit - I did run a test, and it does work correctly. Here's some test code:

static void Main(string[] args)
{
    byte[] data = new byte[]
    {
        0x01, 0x02, 0x03, 0x04, 0x05,
        0xFF, 0xFE, 0xFD, 0xFC, 0xFB, 0xFA
    };
    byte[] sequence = new byte[] { 0x02, 0x03, 0x04, 0x05 };
    foreach (byte b in data.AfterSequence(sequence))
    {
        Console.WriteLine(b);
    }
    Console.ReadLine();
}
like image 170
Aaronaught Avatar answered Nov 15 '22 10:11

Aaronaught


If you convert your bytes into a string, you can take advantage of the myriad of searching functions built into that, even if the bytes you're working with aren't actually characters in the traditional sense.

like image 29
MikeP Avatar answered Nov 15 '22 10:11

MikeP