Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find a common string within a list of strings

I'm very close on this. I got a question posed to me yesterday by a developer if I could have a look at this.

I feel close, but I think some people here would appreciate the challenge too and I am lost.

If I have a List<string> which has the following members:

Today

Monday

Tuesday

Wednesday

I want to get a return string day because this is the largest common string in the List<string>. This should be done irrespective of position and string length, just want to find the largest length common string in a host of strings.

My attempt failed a bit miserably, I selected:

Monday - Tuesday

Monday - Wednesday

And then did an Intersect between each. Obviously this would return multiple strings, however for Monday - Wednesday you get nday because that is what letters it has common.

Here is my code:

  List<string> strs = new List<string>();
  strs.Add("Monday");
  strs.Add("Tuesday");
  strs.Add("Wednesday");

  var v = strs.SelectMany((day, i) => strs.Select((day2, j) => new
  {
    iDay = i,
    Day = day,
    iDay2 = j,
    Day2 = day2
  })).Where(x => x.iDay != x.iDay2).Select(x => new string(x.Day.Intersect(x.Day2).ToArray()));

Anybody have a nice and neat solution?

NOTE

It doesn't have to be LINQ

If there isn't a common string, return null or empty string.

like image 373
LukeHennerley Avatar asked Nov 22 '12 09:11

LukeHennerley


People also ask

How do you find the common substring of a list in Python?

Find common substring in list of strings python. The Python string count method can be used to check if a string contains a substring by counting the number of times the substring exists in the broader string. The method will return the number times the substring exists.

How do I find a string in a list?

Python Find String in List using count() We can also use count() function to get the number of occurrences of a string in the list. If its output is 0, then it means that string is not present in the list. l1 = ['A', 'B', 'C', 'D', 'A', 'A', 'C'] s = 'A' count = l1.


2 Answers

This works better than my first approach(striked out).

You can use following extension to get all substrings of the shortest string in the list(for efficiency):

public static IEnumerable<string> getAllSubstrings(this string word)
{
    return from charIndex1 in Enumerable.Range(0, word.Length)
           from charIndex2 in Enumerable.Range(0, word.Length - charIndex1 + 1)
           where charIndex2 > 0
           select word.Substring(charIndex1, charIndex2);
}
  • now order these substrings by Length(longest first)
  • look if all other strings(excluding the string itself because that test is redundant) contain that substring (Enumerable.All returns immediately if one string doesn't contain a given substring)
  • if one string appears in all others you have found the longest common substring
  • otherwise repeat that until you've checked all substrings(if no common string was found)

string shortest = list.OrderBy(s => s.Length).First();
IEnumerable<string> shortestSubstrings = shortest
    .getAllSubstrings()
    .OrderByDescending(s => s.Length);
var other = list.Where(s => s != shortest).ToArray();
string longestCommonIntersection = string.Empty;
foreach (string subStr in shortestSubstrings)
{
    bool allContains = other.All(s => s.Contains(subStr));
    if (allContains)
    {
        longestCommonIntersection = subStr;
        break;
    }
}

DEMO

like image 103
Tim Schmelter Avatar answered Oct 21 '22 02:10

Tim Schmelter


Find the shortest entry in the list.

  • Today
  • Monday
  • Tuesday
  • Wednesday

So we use "Today".

Build a list of strings of consecutive characters in "Today" of the length of the string down to each character, in "longest first" order.

"Today",

"Toda", "oday",

"Tod", "oda", "day",

"To", "od", "da", "ay",

"t", "o", "d", "a", "y"

Enumerate over this list, finding the first entry for which all the other strings contain that entry.

        List<string> words = new List<string> { "Today", "Monday", "Tuesday", "Wednesday" };

        // Select shortest word in the list
        string shortestWord = (from word in words
                            orderby word.Length
                            select word).First();

        int shortWordLength = shortestWord.Length;

        // Build up the list of consecutive character strings, in length order.
        List<string> parts = new List<string>();
        for (int partLength = shortWordLength; partLength > 0; partLength--)
        {
            for (int partStartIndex = 0; partStartIndex <= shortWordLength - partLength; partStartIndex++)
            {
                parts.Add(shortestWord.Substring(partStartIndex, partLength));
            }
        }
        // Find the first part which is in all the words.
        string longestSubString = (from part in parts where words.All(s => s.Contains(part)) select part).FirstOrDefault();

       // longestSubString is the longest part of all the words, or null if no matches are found.

EDIT

Thinking a little more about it, you can optimise a little.

You don't need to build up a list of parts - just test each part as it is generated. Also, by sorting the word list in length order, you always test against the shortest strings first to reject candidate parts more quickly.

        string longestSubString = null;

        List<string> words = new List<string> { "Todays", "Monday", "Tuesday" };

        // Sort word list by length
        List<string> wordsInLengthOrder = (from word in words
                                           orderby word.Length
                                           select word).ToList();

        string shortestWord = wordsInLengthOrder[0];
        int shortWordLength = shortestWord.Length;

        // Work through the consecutive character strings, in length order.
        for (int partLength = shortWordLength; (partLength > 0) && (longestSubString == null); partLength--)
        {
            for (int partStartIndex = 0; partStartIndex <= shortWordLength - partLength; partStartIndex++)
            {
                string part = shortestWord.Substring(partStartIndex, partLength);

                // Test if all the words in the sorted list contain the part.
                if (wordsInLengthOrder.All(s => s.Contains(part)))
                {
                    longestSubString = part;
                    break;
                }
            }

        }

        Console.WriteLine(longestSubString);
like image 42
Neil Moss Avatar answered Oct 21 '22 02:10

Neil Moss