Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find sequence in IEnumerable<T> using Linq

Tags:

c#

.net

linq

What is the most efficient way to find a sequence within a IEnumerable<T> using LINQ

I want to be able to create an extension method which allows the following call:

int startIndex = largeSequence.FindSequence(subSequence)

The match must be adjacent and in order.

like image 878
trampster Avatar asked Aug 24 '10 23:08

trampster


People also ask

Does LINQ work with IEnumerable?

All LINQ methods are extension methods to the IEnumerable<T> interface. That means that you can call any LINQ method on any object that implements IEnumerable<T> . You can even create your own classes that implement IEnumerable<T> , and those classes will instantly "inherit" all LINQ functionality!

What is t in IEnumerable?

Type Parameters T. The type of objects to enumerate. This type parameter is covariant. That is, you can use either the type you specified or any type that is more derived.

Which LINQ method allows you to specify the number of items to return in a sequence?

Repeat Method is used to generate a collection of IEnumerable<T> type with a specified number of elements and each element contains same specified value. Second Paramter "int count" represents-> Number of times needed to repeat the generated element of sequnce.

What is LINQ in C# with example?

Language-Integrated Query (LINQ) is the name for a set of technologies based on the integration of query capabilities directly into the C# language. Traditionally, queries against data are expressed as simple strings without type checking at compile time or IntelliSense support.


1 Answers

Here's an implementation of an algorithm that finds a subsequence in a sequence. I called the method IndexOfSequence, because it makes the intent more explicit and is similar to the existing IndexOf method:

public static class ExtensionMethods
{
    public static int IndexOfSequence<T>(this IEnumerable<T> source, IEnumerable<T> sequence)
    {
        return source.IndexOfSequence(sequence, EqualityComparer<T>.Default);
    }

    public static int IndexOfSequence<T>(this IEnumerable<T> source, IEnumerable<T> sequence, IEqualityComparer<T> comparer)
    {
        var seq = sequence.ToArray();

        int p = 0; // current position in source sequence
        int i = 0; // current position in searched sequence
        var prospects = new List<int>(); // list of prospective matches
        foreach (var item in source)
        {
            // Remove bad prospective matches
            prospects.RemoveAll(k => !comparer.Equals(item, seq[p - k]));

            // Is it the start of a prospective match ?
            if (comparer.Equals(item, seq[0]))
            {
                prospects.Add(p);
            }

            // Does current character continues partial match ?
            if (comparer.Equals(item, seq[i]))
            {
                i++;
                // Do we have a complete match ?
                if (i == seq.Length)
                {
                    // Bingo !
                    return p - seq.Length + 1;
                }
            }
            else // Mismatch
            {
                // Do we have prospective matches to fall back to ?
                if (prospects.Count > 0)
                {
                    // Yes, use the first one
                    int k = prospects[0];
                    i = p - k + 1;
                }
                else
                {
                    // No, start from beginning of searched sequence
                    i = 0;
                }
            }
            p++;
        }
        // No match
        return -1;
    }
}

I didn't fully test it, so it might still contain bugs. I just did a few tests on well-known corner cases to make sure I wasn't falling into obvious traps. Seems to work fine so far...

I think the complexity is close to O(n), but I'm not an expert of Big O notation so I could be wrong... at least it only enumerates the source sequence once, whithout ever going back, so it should be reasonably efficient.

like image 79
Thomas Levesque Avatar answered Sep 29 '22 09:09

Thomas Levesque