Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding a subsequence in longer sequence

Tags:

c#

I need to find a sequence in other large sequence, for example, {1,3,2,3} is present in {1,3,2,3,4,3} and {5,1,3,2,3}. Is there any way to do it quickly with IEnumerable or with something else?

like image 543
user906763 Avatar asked Sep 06 '11 18:09

user906763


People also ask

How do you find the longest subsequence?

To find the LIS for a given array, we need to return max(L(i)) where 0 < i < n. Formally, the length of the longest increasing subsequence ending at index i, will be 1 greater than the maximum of lengths of all longest increasing subsequences ending at indices before i, where arr[j] < arr[i] (j < i).

How do you find the subsequence of a sequence?

A subsequence is derived from the original sequence by removing elements, keeping the order. One can remove finite or infinite many elements. Valid edge cases are: remove all elements of the original sequence.

Which algorithm is used to find longest consecutive subsequence?

Naive Approach: The idea is to first sort the array and find the longest subarray with consecutive elements. After sorting the array and removing the multiple occurrences of elements, run a loop and keep a count and max (both initially zero).

Which method can be used to solve the longest common subsequence problem?

1. Which of the following methods can be used to solve the longest common subsequence problem? Explanation: Both recursion and dynamic programming can be used to solve the longest subsequence problem.


3 Answers

Similar to @dlev's but this also handles {1,1,1,2}.ContainsSubsequence({1,1,2})

public static bool ContainsSubsequence<T>(this IEnumerable<T> parent, IEnumerable<T> target)
{
    var pattern = target.ToArray();
    var source = new LinkedList<T>();
    foreach (var element in parent) 
    {
        source.AddLast(element);
        if(source.Count == pattern.Length)
        {
            if(source.SequenceEqual(pattern))
                return true;
            source.RemoveFirst();
        }
    }
    return false;
}
like image 61
erikH Avatar answered Oct 12 '22 17:10

erikH


This method will find a subsequence within a parent sequence, of any type that can be compared via Equals():

public static bool ContainsSubequence<T>(this IEnumerable<T> parent, IEnumerable<T> target)
{
    bool foundOneMatch = false;
    using (IEnumerator<T> parentEnum = parent.GetEnumerator())
    {
        using (IEnumerator<T> targetEnum = target.GetEnumerator())
        {
            // Get the first target instance; empty sequences are trivially contained
            if (!targetEnum.MoveNext())
                return true;

            while (parentEnum.MoveNext())
            {
                if (targetEnum.Current.Equals(parentEnum.Current))
                {
                    // Match, so move the target enum forward
                    foundOneMatch = true;
                    if (!targetEnum.MoveNext())
                    {
                        // We went through the entire target, so we have a match
                        return true;
                    }
                }
                else if (foundOneMatch)
                {
                    return false;
                }
            }

            return false;
        }
    }
}

You can use it like this:

bool match = new[] {1, 2, 3}.ContainsSubsequence(new[] {1, 2}); // match == true
match = new[] {1, 2, 3}.ContainsSubsequence(new[] {1, 3}); // match == false

Note that it assumes the target sequence has no null elements.

Update: Thanks for the upvotes, everyone, but there is actually a bug in the above code! If a partial match is found, but then doesn't turn into a full match, the process is ended, rather than reset (which is obviously incorrected when applied to something like {1, 2, 1, 2, 3}.ContainsSubsequence({1, 2, 3})).

The above code works really well for the more common definition of subsequence (i.e. contiguousness is not required) but in order to handle resetting (which most IEnumerators do not support) the target sequence needs to be enumerated up front. That leads to the following code:

public static bool ContainsSubequence<T>(this IEnumerable<T> parent, IEnumerable<T> target)
{
    bool foundOneMatch = false;
    var enumeratedTarget = target.ToList();
    int enumPos = 0;

    using (IEnumerator<T> parentEnum = parent.GetEnumerator())
    {
        while (parentEnum.MoveNext())
        {
            if (enumeratedTarget[enumPos].Equals(parentEnum.Current))
            {
                // Match, so move the target enum forward
                foundOneMatch = true;
                if (enumPos == enumeratedTarget.Count - 1)
                {
                    // We went through the entire target, so we have a match
                    return true;
                }

                enumPos++;
            }
            else if (foundOneMatch)
            {
                foundOneMatch = false;
                enumPos = 0;

                if (enumeratedTarget[enumPos].Equals(parentEnum.Current))
                {
                    foundOneMatch = true;
                    enumPos++;
                }
            }
        }

        return false;
    }
}

This code doesn't have any bugs, but won't work well for large (or infinite) sequences.

like image 22
dlev Avatar answered Oct 12 '22 15:10

dlev


This is quite a well studied problem and according to my research there are two algorithms that are optimal for this job, depending on your data.

Namely, they are the Knuth-Morris-Pratt algorithm and the Boyer-Moore algorithm.

Here, I submit my implementation of the KMP algorithm, originally reviewed here.

Its written to handle a source or parent sequence of indeterminate length up to Int64.MaxValue.

As we can see, the internal implementation returns a sequence of the indices at which the sub-string or target pattern occurs. You can present these results through your choice of facade.

You could use this simply like this,

var contains = new[] { 1, 3, 2, 3, 4, 3 }.Contains(new[] { 1, 3, 2, 3 });

Here is a working fiddle that shows the code in action.

Below, follows the fully commented code for my answer.

namespace Code
{
    using System;
    using System.Collections.Generic;
    using System.Linq;

    /// <summary>
    /// A generic implementation of the Knuth-Morris-Pratt algorithm that searches,
    /// in a memory efficient way, over a given <see cref="IEnumerable{T}"/>.
    /// </summary>
    public static class KMP
    {
        /// <summary>
        /// Determines whether a sequence contains the search string.
        /// </summary>
        /// <typeparam name="T">
        /// The type of elements of <paramref name="source"/>
        /// </typeparam>
        /// <param name="source">
        /// A sequence of elements
        /// </param>
        /// <param name="pattern">The search string.</param>
        /// <param name="equalityComparer">
        /// Determines whether the sequence contains a specified element.
        /// If <c>null</c>
        /// <see cref="EqualityComparer{T}.Default"/> will be used.
        /// </param>
        /// <returns>
        /// <c>true</c> if the source contains the specified pattern;
        /// otherwise, <c>false</c>.
        /// </returns>
        /// <exception cref="ArgumentNullException">pattern</exception>
        public static bool Contains<T>(
                this IEnumerable<T> source,
                IEnumerable<T> pattern,
                IEqualityComparer<T> equalityComparer = null)
        {
            if (pattern == null)
            {
                throw new ArgumentNullException(nameof(pattern));
            }

            equalityComparer = equalityComparer ?? EqualityComparer<T>.Default;

            return SearchImplementation(source, pattern, equalityComparer).Any();
        }

        public static IEnumerable<long> IndicesOf<T>(
                this IEnumerable<T> source,
                IEnumerable<T> pattern,
                IEqualityComparer<T> equalityComparer = null)
        {
            if (pattern == null)
            {
                throw new ArgumentNullException(nameof(pattern));
            }

            equalityComparer = equalityComparer ?? EqualityComparer<T>.Default;

            return SearchImplementation(source, pattern, equalityComparer);
        }

        /// <summary>
        /// Identifies indices of a pattern string in a given sequence.
        /// </summary>
        /// <typeparam name="T">
        /// The type of elements of <paramref name="source"/>
        /// </typeparam>
        /// <param name="source">
        /// The sequence to search.
        /// </param>
        /// <param name="patternString">
        /// The string to find in the sequence.
        /// </param>
        /// <param name="equalityComparer">
        /// Determines whether the sequence contains a specified element.
        /// </param>
        /// <returns>
        /// A sequence of indices where the pattern can be found
        /// in the source.
        /// </returns>
        /// <exception cref="ArgumentOutOfRangeException">
        /// patternSequence - The pattern must contain 1 or more elements.
        /// </exception>
        private static IEnumerable<long> SearchImplementation<T>(
            IEnumerable<T> source,
            IEnumerable<T> patternString,
            IEqualityComparer<T> equalityComparer)
        {
            // Pre-process the pattern
            (var slide, var pattern) = GetSlide(patternString, equalityComparer);
            var patternLength = pattern.Count;

            if (patternLength == 0)
            {
                throw new ArgumentOutOfRangeException(
                    nameof(patternString),
                    "The pattern must contain 1 or more elements.");
            }

            var buffer = new Dictionary<long, T>(patternLength);
            var more = true;

            long sourceIndex = 0; // index for source
            int patternIndex = 0; // index for pattern

            using(var sourceEnumerator = source.GetEnumerator())
            while (more)
            {
                more = FillBuffer(
                        buffer,
                        sourceEnumerator,
                        sourceIndex,
                        patternLength,
                        out T t);

                if (equalityComparer.Equals(pattern[patternIndex], t))
                {
                    patternIndex++;
                    sourceIndex++;

                    more = FillBuffer(
                        buffer,
                        sourceEnumerator,
                        sourceIndex,
                        patternLength,
                        out t);
                }

                if (patternIndex == patternLength)
                {
                    yield return sourceIndex - patternIndex;
                    patternIndex = slide[patternIndex - 1];
                }
                else if (more && !equalityComparer.Equals(pattern[patternIndex], t))
                {
                    if (patternIndex != 0)
                    {
                        patternIndex = slide[patternIndex - 1];
                    }
                    else
                    {
                        sourceIndex = sourceIndex + 1;
                    }
                }
            }
        }

        /// <summary>
        /// Services the buffer and retrieves the value.
        /// </summary>
        /// <remarks>
        /// The buffer is used so that it is not necessary to hold the
        /// entire source in memory.
        /// </remarks>
        /// <typeparam name="T">
        /// The type of elements of <paramref name="source"/>.
        /// </typeparam>
        /// <param name="buffer">The buffer.</param>
        /// <param name="source">The source enumerator.</param>
        /// <param name="sourceIndex">The element index to retrieve.</param>
        /// <param name="patternLength">Length of the search string.</param>
        /// <param name="value">The element value retrieved from the source.</param>
        /// <returns>
        /// <c>true</c> if there is potentially more data to process;
        /// otherwise <c>false</c>.
        /// </returns>
        private static bool FillBuffer<T>(
            IDictionary<long, T> buffer,
            IEnumerator<T> source,
            long sourceIndex,
            int patternLength,
            out T value)
        {
            bool more = true;
            if (!buffer.TryGetValue(sourceIndex, out value))
            {
                more = source.MoveNext();
                if (more)
                {
                    value = source.Current;
                    buffer.Remove(sourceIndex - patternLength);
                    buffer.Add(sourceIndex, value);
                }
            }

            return more;
        }

        /// <summary>
        /// Gets the offset array which acts as a slide rule for the KMP algorithm.
        /// </summary>
        /// <typeparam name="T">
        /// The type of elements of <paramref name="source"/>.
        /// </typeparam>
        /// <param name="pattern">The search string.</param>
        /// <param name="equalityComparer">
        /// Determines whether the sequence contains a specified element.
        /// If <c>null</c>
        /// <see cref="EqualityComparer{T}.Default"/> will be used.
        /// </param>
        /// <returns>A tuple of the offsets and the enumerated pattern.</returns>
        private static (IReadOnlyList<int> Slide, IReadOnlyList<T> Pattern) GetSlide<T>(
                IEnumerable<T> pattern,
                IEqualityComparer<T> equalityComparer)
        {
            var patternList = pattern.ToList();
            var slide = new int[patternList.Count];

            int length = 0;
            int patternIndex = 1;

            while (patternIndex < patternList.Count)
            {
                if (equalityComparer.Equals(
                        patternList[patternIndex],
                        patternList[length]))
                {
                    length++;
                    slide[patternIndex] = length;
                    patternIndex++;
                }
                else
                {
                    if (length != 0)
                    {
                        length = slide[length - 1];
                    }
                    else
                    {
                        slide[patternIndex] = length;
                        patternIndex++;
                    }
                }
            }

            return (slide, patternList);
        }
    }
}
like image 31
Jodrell Avatar answered Oct 12 '22 15:10

Jodrell