What is the most efficient way to find a sequence within a IEnumerable<T>
using LINQ
I want to be able to create an extension method which allows the following call:
int startIndex = largeSequence.FindSequence(subSequence)
The match must be adjacent and in order.
All LINQ methods are extension methods to the IEnumerable<T> interface. That means that you can call any LINQ method on any object that implements IEnumerable<T> . You can even create your own classes that implement IEnumerable<T> , and those classes will instantly "inherit" all LINQ functionality!
Type Parameters T. The type of objects to enumerate. This type parameter is covariant. That is, you can use either the type you specified or any type that is more derived.
Repeat Method is used to generate a collection of IEnumerable<T> type with a specified number of elements and each element contains same specified value. Second Paramter "int count" represents-> Number of times needed to repeat the generated element of sequnce.
Language-Integrated Query (LINQ) is the name for a set of technologies based on the integration of query capabilities directly into the C# language. Traditionally, queries against data are expressed as simple strings without type checking at compile time or IntelliSense support.
Here's an implementation of an algorithm that finds a subsequence in a sequence. I called the method IndexOfSequence
, because it makes the intent more explicit and is similar to the existing IndexOf
method:
public static class ExtensionMethods
{
public static int IndexOfSequence<T>(this IEnumerable<T> source, IEnumerable<T> sequence)
{
return source.IndexOfSequence(sequence, EqualityComparer<T>.Default);
}
public static int IndexOfSequence<T>(this IEnumerable<T> source, IEnumerable<T> sequence, IEqualityComparer<T> comparer)
{
var seq = sequence.ToArray();
int p = 0; // current position in source sequence
int i = 0; // current position in searched sequence
var prospects = new List<int>(); // list of prospective matches
foreach (var item in source)
{
// Remove bad prospective matches
prospects.RemoveAll(k => !comparer.Equals(item, seq[p - k]));
// Is it the start of a prospective match ?
if (comparer.Equals(item, seq[0]))
{
prospects.Add(p);
}
// Does current character continues partial match ?
if (comparer.Equals(item, seq[i]))
{
i++;
// Do we have a complete match ?
if (i == seq.Length)
{
// Bingo !
return p - seq.Length + 1;
}
}
else // Mismatch
{
// Do we have prospective matches to fall back to ?
if (prospects.Count > 0)
{
// Yes, use the first one
int k = prospects[0];
i = p - k + 1;
}
else
{
// No, start from beginning of searched sequence
i = 0;
}
}
p++;
}
// No match
return -1;
}
}
I didn't fully test it, so it might still contain bugs. I just did a few tests on well-known corner cases to make sure I wasn't falling into obvious traps. Seems to work fine so far...
I think the complexity is close to O(n), but I'm not an expert of Big O notation so I could be wrong... at least it only enumerates the source sequence once, whithout ever going back, so it should be reasonably efficient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With