Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# performance of static string[] contains() (slooooow) vs. == operator

Tags:

c#

Just a quick query: I had a piece of code which compared a string against a long list of values, e.g.

if(str == "string1" || str == "string2" || str == "string3" || str == "string4".
     DoSomething();

And the interest of code clarity and maintainability I changed it to

public static string[] strValues = { "String1", "String2", "String3", "String4"};
...
if(strValues.Contains(str)
    DoSomething();

Only to find the code execution time went from 2.5secs to 6.8secs (executed ca. 200,000 times).
I certainly understand a slight performance trade off, but 300%?
Anyway I could define the static strings differently to enhance performance?
Cheers.

like image 462
Andrew White Avatar asked May 20 '10 22:05

Andrew White


2 Answers

Fyi..

Using:

private static HashSet<string> strHashSet = new HashSet<string>() 
{ "0string", "1string", "2string", "3string", "4string", "5string", 
  "6string", "7string", "8string", "9string", "Astring", "Bstring" };

private static List<string> strList = strHashSet.ToList();
private static string[] strArray = strList.ToArray();
private static Dictionary<int, string> strHashDict = strHashSet.ToDictionary(h => h.GetHashCode());
private static Dictionary<string, string> strDict = strHashSet.ToDictionary(h => h);

// Only one test uses this method.
private static bool ExistsInList(string str)
{
  return strHashDict.ContainsKey(str.GetHashCode());
}

Checking for the first and last strings in the list then checking for a string not in the list: "xstring" Executing 500,000 iterations, all times in milliseconds.

1.A Test: result = (str == "0string" || str == "1string" ...
[storage var]          [first]:[ last ]:[ none ]:[average]
strArray                 3.78 :  45.90 :  57.77 :  35.82

2.A Test: ExistsInList(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
none                    36.14 :  28.97 :  24.02 :  29.71

3.A Test: .ContainsKey(string.GetHashCode());
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashDict             34.86 :  28.41 :  21.46 :  28.24

4.A Test: .ContainsKey(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strDict                 38.99 :  32.34 :  22.75 :  31.36

5.A Test: .Contains(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashSet              39.54 :  34.78 :  24.17 :  32.83
strList                 23.36 : 122.07 : 127.38 :  90.94
strArray               350.34 : 426.29 : 426.05 : 400.90

6.A Test: .Any(p => p == string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashSet              75.70 : 331.38 : 339.40 : 248.82
strList                 72.51 : 305.00 : 319.29 : 232.26
strArray                38.49 : 213.63 : 227.13 : 159.75

Interesting (if not unexpected) results when we change the strings in the list:

private static HashSet<string> strHashSet = new HashSet<string>() 
{ "string00", "string01", "string02", "string03", "string04", "string05", 
  "string06", "string07", "string08", "string09", "string10", "string11" };

With "string99" as the none check.

1.B Test: result = (str == "string00" || str == "string01" ...
[storage var]          [first]:[ last ]:[ none ]:[average]
strArray                85.45 :  87.06 :  91.82 :  88.11

2.B Test: ExistsInList(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
none                    30.12 :  27.97 :  21.36 :  26.48

3.B Test: .ContainsKey(string.GetHashCode());
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashDict             32.51 :  28.00 :  20.83 :  27.11

4.B Test: .ContainsKey(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strDict                 36.45 :  32.13 :  22.39 :  30.32

5.B Test: .Contains(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashSet              37.29 :  34.33 :  23.56 :  31.73
strList                 23.34 : 147.75 : 153.04 : 108.04
strArray               349.62 : 460.19 : 459.99 : 423.26

6.B Test: .Any(p => p == string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashSet              76.26 : 355.09 : 361.31 : 264.22
strList                 70.20 : 332.33 : 341.79 : 248.11
strArray                37.23 : 234.70 : 251.81 : 174.58

For cases A and B looks like tests 2 and 3 have the advantage.

However, HashSet.Contains(string) is very efficient, not effected by list contents and has a clear syntax...might be the best choice.

Yes, it is true, I have no life.

like image 105
Rusty Avatar answered Sep 25 '22 04:09

Rusty


Are you actually running this 200,000 times in production code? You may want to consider hash checks as a faster negative-check if so.

If the 200,000 times was just to illustrate the difference, then I wouldn't worry about it. It's only a 0.02 millisecond increase in time.

Contains is more general-purpose than testing static strings, so there is a small amount of overhead. Unless this code is a bottleneck as Mark mentioned, it's not worth optimizing. There's a famous quote in CS: "premature optimization is the root of all evil." The quote isn't quite accurate, but it's a good reminder of the ultimate optimization guideline: measure first.

like image 31
Stephen Cleary Avatar answered Sep 24 '22 04:09

Stephen Cleary