I want to make sure that C# string does not contain specific characters.
I am using string.IndexOfAny(char[])
, in my opinion Regex would be slower in this task. Is there a better way to accomplish this? Speed is critical in my application.
Ran a quick benchmark on IndexOf
vs IndexOfAny
vs Regex
vs Hashset
.
500 word lorem ipsum haystack, with two character needle.
Tested with both needles in haystack, one in haystack, neither in haystack.
private long TestIndexOf(string haystack, char[] needles)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
int x = haystack.IndexOfAny(needles);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
private long TestRegex(string haystack, char[] needles)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
Regex regex = new Regex(string.Join("|", needles));
for (int i = 0; i < 1000000; i++)
{
Match m = regex.Match(haystack);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
private long TestIndexOf(string haystack, char[] needles)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
int x = haystack.IndexOf(needles[0]);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
private long TestHashset(string haystack, char[] needles)
{
HashSet<char> specificChars = new HashSet<char>(needles.ToList());
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
bool notContainsSpecificChars = !haystack.Any(specificChars.Contains);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
Result for 1,000,000 iterations:
Index of : 28/2718/2711
Index of Any : 153/141/17561
Regex of : 1068/1102/92324
Hashset : 939/891/111702
Notes:
Overall, regex
is slower then indexofany
by upto 10x depending on the haystack and needle sizes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With