Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast string suffix checking in C# (.NET 4.0)?

Tags:

string

c#

What is the fastest method of checking string suffixes in C#?

I need to check each string in a large list (anywhere from 5000 to 100000 items) for a particular term. The term is guaranteed never to be embedded within the string. In other words, if the string contains the term, it will be at the end of the string. The string is also guaranteed to be longer than the suffix. Cultural information is not important.

These are how different methods performed against 100000 strings (half of them have the suffix):

 1.  Substring Comparison - 13.60ms
 2.  String.Contains - 22.33ms
 3.  CompareInfo.IsSuffix - 24.60ms
 4.  String.EndsWith - 29.08ms
 5.  String.LastIndexOf - 30.68ms

These are average times. [Edit] Forgot to mention that the strings also get put into separate lists, but this is not important. It does add to the running time though.

On my system substring comparison (extracting the end of the string using the String.Substring method and comparing it to the suffix term) is consistently the fastest when tested against 100000 strings. The problem with using substring comparison though is that Garbage Collection can slow it down considerably (more than the other methods) because String.Substring creates new strings. The effect is not as bad in .NET 4.0 as it was in 3.5 and below, but it is still noticeable. In my tests, String.Substring performed consistently slower on sets of 12000-13000 strings. This will obviously differ between systems and implementations.

[EDIT] Benchmark code: http://pastebin.com/smEtYNYN

[EDIT] FlyingStreudel's code runs fast, but Jon Skeet's recommendation of using EndsWith in conjunction with StringComparison.Ordinal appears to be the best option.

like image 684
ilitirit Avatar asked Feb 08 '11 14:02

ilitirit


2 Answers

If that's the time taken to check 100,000 strings, does it really matter?

Personally I'd use string.EndsWith on the grounds that it's the most descriptive: it says exactly what you're trying to test.

I'm somewhat suspicious of the fact that it appears to be performing worst though... if you could post your benchmark code, that would be very useful. (In particular, it really shouldn't have to do as much work as string.Contains.)

Have you tried specifying an ordinal match? That may well make it significantly faster:

if (x.EndsWith(y, StringComparison.Ordinal))

Of course, you shouldn't do that unless you want an ordinal comparison - are you expecting culturally-sensitive matches? (Developers tend not to consider this sort of thing, and I very firmly include myself in that category.)

like image 141
Jon Skeet Avatar answered Sep 19 '22 20:09

Jon Skeet


Jon is absolutely right; this is potentially not an apples-to-apples comparison because different string methods have different defaults for culteral sensitivity. Be very sure that you are getting the comparison semantics you intend to in each one.

In addition to Jon's answer, I'd add that the relevant question is not "which is fastest?" but rather "which is too slow?" What's your performance goal for this code? The slowest method still finds the result in less time than it takes a movie projector to advance to the next frame, and obviously that is not noticable by humans. If your goal is that the search appears instantaneous to the user then you're done; any of those methods work. If your goal is that the search take less than a millisecond then none of those methods work; they are all orders of magnitude too slow. What's the budget?

like image 42
Eric Lippert Avatar answered Sep 20 '22 20:09

Eric Lippert