Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intersect two lists with preserved order in the first one

I'm facing a problem I don't even know what to search in Google/Stack Overflow. So comment if you feel the need for further explanation, questions.

Basically I want to intersect two lists and return the similarity with the preserved order of the original first string value.

Example:

I have two strings, that I convert to a CharArray. I want to Intersect these two arrays and return the values that are similar, including/with the order of the first string (s1).


As you can see the first string contains E15 (in that specific order), and so does the seconds one.

So these two strings will return : { 'E', '1', '5' }

string s1 = "E15QD(A)";
string s2 = "NHE15H";

The problem I am facing is that if i replace "s2" with:

string s2 = "NQE18H" // Will return {'Q', 'E', '1' }

My operation will return : {'Q', 'E', '1' }

The result should be : {'E', '1' } because Q don't follow the letter 1

Currently my operation is not the greatest effort, because i don't know which methods to use in .NET to be able to do this.

Current code:

List<char> cA1 = s1.ToList();
List<char> cA2 = s2.ToList();

var result = cA1.Where(x => cA2.Contains(x)).ToList();

Feel free to help me out, pointers in the right direction is acceptable as well as a full solution.

like image 938
J.Olsson Avatar asked Mar 13 '15 10:03

J.Olsson


1 Answers

This is a "longest common substring" problem.

You can use this extension to get all substrings lazily:

public static class StringExtensions
{
    public static IEnumerable<string> GetSubstrings(this string str)
    {
        if (string.IsNullOrEmpty(str))
            throw new ArgumentException("str must not be null or empty", "str");

        for (int c = 0; c < str.Length - 1; c++)
        {
            for (int cc = 1; c + cc <= str.Length; cc++)
            {
                yield return str.Substring(c, cc);
            }
        }
    }
}

Then it's easy and readable with this LINQ query:

string longestIntersection = "E15QD(A)".GetSubstrings()
    .Intersect("NQE18H".GetSubstrings())
    .OrderByDescending(s => s.Length)
    .FirstOrDefault();  // E1

Enumerable.Intersect is also quite efficient since it's using a set. One note: if one both strings is larger than the other then it's more efficient(in terms of memory) to use it first:

longString.GetSubstrings().Intersect(shortString.GetSubstrings())
like image 83
Tim Schmelter Avatar answered Nov 02 '22 19:11

Tim Schmelter