Just a quick query: I had a piece of code which compared a string against a long list of values, e.g.
if(str == "string1" || str == "string2" || str == "string3" || str == "string4".
DoSomething();
And the interest of code clarity and maintainability I changed it to
public static string[] strValues = { "String1", "String2", "String3", "String4"};
...
if(strValues.Contains(str)
DoSomething();
Only to find the code execution time went from 2.5secs to 6.8secs (executed ca. 200,000 times).
I certainly understand a slight performance trade off, but 300%?
Anyway I could define the static strings differently to enhance performance?
Cheers.
Fyi..
Using:
private static HashSet<string> strHashSet = new HashSet<string>()
{ "0string", "1string", "2string", "3string", "4string", "5string",
"6string", "7string", "8string", "9string", "Astring", "Bstring" };
private static List<string> strList = strHashSet.ToList();
private static string[] strArray = strList.ToArray();
private static Dictionary<int, string> strHashDict = strHashSet.ToDictionary(h => h.GetHashCode());
private static Dictionary<string, string> strDict = strHashSet.ToDictionary(h => h);
// Only one test uses this method.
private static bool ExistsInList(string str)
{
return strHashDict.ContainsKey(str.GetHashCode());
}
Checking for the first and last strings in the list then checking for a string not in the list: "xstring" Executing 500,000 iterations, all times in milliseconds.
1.A Test: result = (str == "0string" || str == "1string" ...
[storage var] [first]:[ last ]:[ none ]:[average]
strArray 3.78 : 45.90 : 57.77 : 35.82
2.A Test: ExistsInList(string);
[storage var] [first]:[ last ]:[ none ]:[average]
none 36.14 : 28.97 : 24.02 : 29.71
3.A Test: .ContainsKey(string.GetHashCode());
[storage var] [first]:[ last ]:[ none ]:[average]
strHashDict 34.86 : 28.41 : 21.46 : 28.24
4.A Test: .ContainsKey(string);
[storage var] [first]:[ last ]:[ none ]:[average]
strDict 38.99 : 32.34 : 22.75 : 31.36
5.A Test: .Contains(string);
[storage var] [first]:[ last ]:[ none ]:[average]
strHashSet 39.54 : 34.78 : 24.17 : 32.83
strList 23.36 : 122.07 : 127.38 : 90.94
strArray 350.34 : 426.29 : 426.05 : 400.90
6.A Test: .Any(p => p == string);
[storage var] [first]:[ last ]:[ none ]:[average]
strHashSet 75.70 : 331.38 : 339.40 : 248.82
strList 72.51 : 305.00 : 319.29 : 232.26
strArray 38.49 : 213.63 : 227.13 : 159.75
Interesting (if not unexpected) results when we change the strings in the list:
private static HashSet<string> strHashSet = new HashSet<string>()
{ "string00", "string01", "string02", "string03", "string04", "string05",
"string06", "string07", "string08", "string09", "string10", "string11" };
With "string99" as the none check.
1.B Test: result = (str == "string00" || str == "string01" ...
[storage var] [first]:[ last ]:[ none ]:[average]
strArray 85.45 : 87.06 : 91.82 : 88.11
2.B Test: ExistsInList(string);
[storage var] [first]:[ last ]:[ none ]:[average]
none 30.12 : 27.97 : 21.36 : 26.48
3.B Test: .ContainsKey(string.GetHashCode());
[storage var] [first]:[ last ]:[ none ]:[average]
strHashDict 32.51 : 28.00 : 20.83 : 27.11
4.B Test: .ContainsKey(string);
[storage var] [first]:[ last ]:[ none ]:[average]
strDict 36.45 : 32.13 : 22.39 : 30.32
5.B Test: .Contains(string);
[storage var] [first]:[ last ]:[ none ]:[average]
strHashSet 37.29 : 34.33 : 23.56 : 31.73
strList 23.34 : 147.75 : 153.04 : 108.04
strArray 349.62 : 460.19 : 459.99 : 423.26
6.B Test: .Any(p => p == string);
[storage var] [first]:[ last ]:[ none ]:[average]
strHashSet 76.26 : 355.09 : 361.31 : 264.22
strList 70.20 : 332.33 : 341.79 : 248.11
strArray 37.23 : 234.70 : 251.81 : 174.58
For cases A and B looks like tests 2 and 3 have the advantage.
However, HashSet.Contains(string) is very efficient, not effected by list contents and has a clear syntax...might be the best choice.
Yes, it is true, I have no life.
Are you actually running this 200,000 times in production code? You may want to consider hash checks as a faster negative-check if so.
If the 200,000 times was just to illustrate the difference, then I wouldn't worry about it. It's only a 0.02 millisecond increase in time.
Contains
is more general-purpose than testing static strings, so there is a small amount of overhead. Unless this code is a bottleneck as Mark mentioned, it's not worth optimizing. There's a famous quote in CS: "premature optimization is the root of all evil." The quote isn't quite accurate, but it's a good reminder of the ultimate optimization guideline: measure first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With