Is HashSet<T> the fastest container to look up in?

Tags:

I need to check that specific string contains in the set of others:

private bool Contains(string field)
{
   return this.Fields.Contains(field); // HashSet<string> local property
}

What is the best type of container to use if only one task of it - to hold a number of strings and check does another one is into or does not?

446

asked Feb 13 '10 20:02

2 Answers

Yes, HashSet is perfect for this since it contains one value to look up unlike a Dictionary which requires a key and a value.

127

answered Sep 23 '22 01:09

Ahmad Mageed

Does HashSet work? Sure. But that's not the question you asked. You asked for the fastest possible lookup.

Is it the fastest possible? No, of course not, not by any measure.

First off, in order to talk about "fastest" we need to describe precisely what "fastest" means. Do you mean:

smallest worst possible case timing
smallest average timing averaged over many timings
smallest average timing given a particular usage pattern
something else

? Please clarify precisely what "fastest possible" means. We can devise you an algorithm which is the in theory fastest possible only if we know precisely what fastest possible means to you.

For example, suppose you are writing a compiler. Something we have to do all the time in compilers is check whether a particular string is in a list of strings. Perhaps we are checking to see if a string is a keyword, so we have to look up whether a given string is inside the set {"int", "double", "for", "foreach", "class" ... }

We could put those in a hash set and get decent performance. But if we wanted the best possible performance we could do a lot better. We could, for example, do an analysis of a few billion lines of existing source code to find out which keywords were most common and which were least common, and then write a custom hash table that optimized for (1) rapidly rejecting things that were not keywords at all, and (2) rapidly recognizing the most common keywords at the expense of recognizing other keywords.

Note that this requires static analysis; though it performs well on typical cases, it performs poorly on those rare cases where there are lots of rare keywords used. Another approach we could take would be to write a self tuning hash table that dynamically identified when particular strings were being searched for frequently.

Consider, for example, if you are writing an implementation of the JScript runtime. We frequently must look for a string in a set of strings:

for(i = 0; i < 10; ++i) { foo.bar(i); }

Here we must look up the string "bar" inside the object identified by "foo" ten times. The hash table inside "foo" that implements that lookup notices the first time through the loop that "bar" has been used, so it dynamically tweaks the hash table structure so that the second time through the loop, the lookup is faster. This is the strategy we employed in our implementation of JScript.

Now, that optimizes the case for loops, but it makes this case potentially slower than it could be:

for(i = 0; i < 10; ++i) { foo.bar(i); foo.blah(i); foo.abc(i); }

because we don't do more analysis and realize "hey, we just re-optimized this hash table three times, and now we're going to do it all again, maybe we should just leave it as it is."

Fortunately for us, we were not, like you, looking for the fastest possible lookup. We were only looking for a reasonably fast lookup.

Can you carefully and completely describe what exactly your usage case is for the fastest possible lookup? There are lots of algorithms you can use to speed up lookups, but they get very complicated.

answered Sep 23 '22 01:09

Eric Lippert

Related questions
                            
                                What is difference between File.Exists("") and FileInfo exists
                            
                                TimeSpan to Custom string like HH:mm:ss [duplicate]
                            
                                How to change whole solution's project's name in Visual Studio?
                            
                                MIssing method in System.Web.Http.ApiController.get_Request()
                            
                                Unexpected end of Stream, the content may have already been read by another component. Microsoft.AspNetCore.WebUtilities.MultipartReaderStream
                            
                                Validate String against USPS State Abbreviations
                            
                                Large Arrays, and LOH Fragmentation. What is the accepted convention?
                            
                                WPF: Binding to commands in code behind
                            
                                How to quickly save/load class instance to file
                            
                                How to programmatically set selected Panorama item in WP7
                            
                                Method of removing unnecessary namespaces in .NET Applications?
                            
                                How to save workbook without showing save dialog with Excel interop?
                            
                                C# Executable Executing directory
                            
                                Generate HTML table from list of generic class with specified properties
                            
                                get common elements in lists in C#
                            
                                Getting Windows serial number (was: Getting MachineGuid from Registry)
                            
                                How to do proper Parallel.ForEach, locking and progress reporting
                            
                                Why are declarations necessary
                            
                                int.TryParse = null if not numeric?
                            
                                Random playlist algorithm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is HashSet<T> the fastest container to look up in?

Tags:

c#

.net

contains

hashset

abatishchev

People also ask

2 Answers

Ahmad Mageed

Eric Lippert

Recent Activity

Donate For Us