Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Occurrences of a List<string> in a string C#

Tags:

string

c#

linq

Given

var stringList = new List<string>(new string[] {
                   "outage","restoration","efficiency"});

var queryText = "While walking through the park one day, I noticed an outage",
              "in the lightbulb at the plant. I talked to an officer about", 
              "restoration protocol for public works, and he said to contact",
              "the department of public works, but not to expect much because",
              "they have low efficiency."

How do I get the overall number of occurances of all strings in stringList from queryText?

In the above example, I would want a method that returned 3;

private int stringMatches (string textToQuery, string[] stringsToFind)
{
    //
}

RESULTS

SPOKE TOO SOON!

Ran a couple of performance tests, and this branch of code from Fabian was faster by a good margin:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
    int count = 0;
    foreach (var stringToFind in stringsToFind)
    {
        int currentIndex = 0;

    while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
    {
       currentIndex++;
       count++;
    }
    }
    return count;
}

Execution Time: On a 10000 iteration loop using stopwatch:

Fabian: 37-42 milliseconds

lazyberezovsky StringCompare: 400-500 milliseconds

lazyberezovsky Regex: 630-680 milliseconds

Glenn: 750-800 milliseconds

(Added StringComparison.Ordinal to Fabians answer for additional speed.)

like image 288
Wesley Avatar asked Jul 26 '13 22:07

Wesley


3 Answers

That might also be fast:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
  int count = 0;
  foreach (var stringToFind in stringsToFind)
  {
    int currentIndex = 0;

    while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
    {
     currentIndex++;
     count++;
    }
  }
  return count;
}
like image 111
Fabian Bigler Avatar answered Nov 20 '22 15:11

Fabian Bigler


This LINQ query splits text by spaces and punctuation symbols, and searches matches ignoring case

private int stringMatches(string textToQuery, string[] stringsToFind)
{
   StringComparer comparer = StringComparer.CurrentCultureIgnoreCase;
   return textToQuery.Split(new []{' ', '.', ',', '!', '?'}) // add more if need
                     .Count(w => stringsToFind.Contains(w, comparer));
}

Or with regular expression:

private static int stringMatches(string textToQuery, string[] stringsToFind)
{
    var pattern = String.Join("|", stringsToFind.Select(s => @"\b" + s + @"\b"));
    return Regex.Matches(textToQuery, pattern, RegexOptions.IgnoreCase).Count;
}
like image 4
Sergey Berezovskiy Avatar answered Nov 20 '22 15:11

Sergey Berezovskiy


If you want to count the words in the string that are in the other collection:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
    return textToQuery.Split().Intersect(stringsToFind).Count();
}
like image 3
Tim Schmelter Avatar answered Nov 20 '22 15:11

Tim Schmelter