Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maximum number of occurrences a character appears in an array of strings

In C#, given the array :

string[] myStrings = new string[] {
  "test#test",
  "##test",
  "######", // Winner (outputs 6)
};

How can I find the maximum number of occurrences that the character # appears in a single string ?

My current solution is :

int maxOccurrences = 0;
foreach (var myString in myStrings)
{
    var occurrences = myString.Count(x => x == '#');
    if (occurrences > maxOccurrences)
    {
        maxOccurrences = occurrences;
    }
}

return maxOccurrences;

Is their a simplier way using linq that can act directly on the myStrings[] array ?

And can this be made into an extension method that can work on any IEnumerable<string> ?

like image 404
MrDeveloper Avatar asked Jan 07 '23 18:01

MrDeveloper


1 Answers

First of all let's project your strings into a sequence with count of matches:

myStrings.Select(x => x.Count(x => x == '#')) // {1, 2, 6} in your example

Then pick maximum value:

int maximum = myStrings
    .Select(s => s.Count(x => x == '#'))
    .Max(); // 6 in your example

Let's make an extension method:

public static int CountMaximumOccurrencesOf(this IEnumerable<string> strings, char ch)
{
    return strings
        .Select(s => s.Count(c => c == ch))
        .Max();
}

However there is a big HOWEVER. What in C# you call char is not what you call character in your language. This has been widely discussed in other posts, for example: Fastest way to split a huge text into smaller chunks and How can I perform a Unicode aware character by character comparison? then I won't repeat everything here. To be "Unicode aware" you need to make your code more complicate (please note code is wrote here then it's untested):

private static IEnumerable<string> EnumerateCharacters(string s)
{
    var enumerator = StringInfo.GetTextElementEnumerator(s.Normalize());
    while (enumerator.MoveNext())
        yield return (string)enumerator.Value;
}

Then change our original code to:

public static int CountMaximumOccurrencesOf(this IEnumerable<string> strings, string character)
{
    return strings
        .Select(s => s.EnumerateCharacters().Count(c => String.Equals(c, character, StringComparison.CurrentCulture))
        .Max();
}

Note that Max() alone requires collection to don't be empty (use DefaultIfEmpty() if collection may be empty and it's not an error). To do not arbitrary decide what to do in this situation (throw an exception if it should happen or just return 0) you can may make this method less specialized and leave this responsibility to caller:

public static int CountOccurrencesOf(this IEnumerable<string> strings,
    string character,
    StringComparison comparison = StringComparison.CurrentCulture)
{
    Debug.Assert(character.EnumerateCharacters().Count() == 1);

    return strings
        .Select(s => s.EnumerateCharacters().Count(c => String.Equals(c, character, comparison ));
}

Used like this:

var maximum = myStrings.CountOccurrencesOf("#").Max();

If you need it case-insensitive:

var maximum = myStrings.CountOccurrencesOf("à", StringComparison.CurrentCultureIgnoreCase)
    .Max();

As you can now imagine this comparison isn't limited to some esoteric languages but it also applies to invariant culture (en-US) then for strings that must always be compared with invariant culture you should specify StringComparison.InvariantCulture. Don't forget that you may need to call String.Normalize() also for input character.

like image 65
Adriano Repetti Avatar answered Jan 10 '23 15:01

Adriano Repetti