Given
var stringList = new List<string>(new string[] {
"outage","restoration","efficiency"});
var queryText = "While walking through the park one day, I noticed an outage",
"in the lightbulb at the plant. I talked to an officer about",
"restoration protocol for public works, and he said to contact",
"the department of public works, but not to expect much because",
"they have low efficiency."
How do I get the overall number of occurances of all strings in stringList from queryText?
In the above example, I would want a method that returned 3;
private int stringMatches (string textToQuery, string[] stringsToFind)
{
//
}
RESULTS
SPOKE TOO SOON!
Ran a couple of performance tests, and this branch of code from Fabian was faster by a good margin:
private int stringMatches(string textToQuery, string[] stringsToFind)
{
int count = 0;
foreach (var stringToFind in stringsToFind)
{
int currentIndex = 0;
while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
{
currentIndex++;
count++;
}
}
return count;
}
Execution Time: On a 10000 iteration loop using stopwatch:
Fabian: 37-42 milliseconds
lazyberezovsky StringCompare: 400-500 milliseconds
lazyberezovsky Regex: 630-680 milliseconds
Glenn: 750-800 milliseconds
(Added StringComparison.Ordinal to Fabians answer for additional speed.)
That might also be fast:
private int stringMatches(string textToQuery, string[] stringsToFind)
{
int count = 0;
foreach (var stringToFind in stringsToFind)
{
int currentIndex = 0;
while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
{
currentIndex++;
count++;
}
}
return count;
}
This LINQ query splits text by spaces and punctuation symbols, and searches matches ignoring case
private int stringMatches(string textToQuery, string[] stringsToFind)
{
StringComparer comparer = StringComparer.CurrentCultureIgnoreCase;
return textToQuery.Split(new []{' ', '.', ',', '!', '?'}) // add more if need
.Count(w => stringsToFind.Contains(w, comparer));
}
Or with regular expression:
private static int stringMatches(string textToQuery, string[] stringsToFind)
{
var pattern = String.Join("|", stringsToFind.Select(s => @"\b" + s + @"\b"));
return Regex.Matches(textToQuery, pattern, RegexOptions.IgnoreCase).Count;
}
If you want to count the words in the string that are in the other collection:
private int stringMatches(string textToQuery, string[] stringsToFind)
{
return textToQuery.Split().Intersect(stringsToFind).Count();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With