I need a fast method to determine if a given string is in a list of strings.
The list of strings is not known until runtime but thereafter it will not change.
I could simply have a List<String>
called strings
and then do:
if (strings.Contains(item))
However this will perform poorly if there are many strings in the list.
I could also use a HashSet<String>
, but this would necessitate calling GetHashCode
on every incoming string as well as Equals
, which would be a waste if there are e.g. only 3 strings in the list. Did I mention this needs to be fast?
I could when setting up, decide to use a List
or a HashSet
depending on the number of strings (e.g. use List for less than 10 strings, HashSet otherwise), rather like the logic in HybridDictionary
.
As the strings are unicode, a standard Trie structure will not work, although a Radix tree/Patricia trie might. Are there any good C# implementations out there with benchmarks?
Some have mentioned bypassing String
's GetHashCode
and using a faster performing hash function. Are there any benchmarks out there?
Using LINQ expressions to essentially create an optimised switch statement is a novel approach which looks very interesting.
What else would work? Setup cost is not important, just the search speed.
If it matters, the incoming string values will rarely appear in the list.
You could use a trie to hold the list of strings; tries were designed for fast retrieval. Here's one example of implementing a trie in C#.
Update: Powerpoint presentation on folded tries for Unicode and Ifo on implementation of a folded trie for Unicode (not C#)
Check out these:
Jomo Fisher - Fast Switching with LINQ
Gustavo Guerra - StaticStringDictionary - Fast Switching with LINQ revisited
Have you considered using the HashSet class (in .NET 3) instead?
Re your "when the list is small" concern; if you don't mind using non-generic collections, System.Collections.Specialized.HybridDictionary
does something like this; it encapsulates a System.Collections.Specialized.ListDictionary
when small, or a System.Collections.Hashtable
when it gets bigger (>10
). Worth a look?
Otherwise; you could perhaps use HashSet<T>
with a custom comparer? Then you can choose how expensive GetHashCode()
is...
using System;
using System.Collections.Generic;
class CustomStringComparer : IEqualityComparer<string> {
public bool Equals(string x, string y) {
return string.Equals(x, y);
}
public int GetHashCode(string s) {
return string.IsNullOrEmpty(s) ? 0 :
s.Length + 273133 * (int)s[0];
}
private CustomStringComparer() { }
public static readonly CustomStringComparer Default
= new CustomStringComparer();
}
static class Program {
static void Main() {
HashSet<string> set = new HashSet<string>(
new string[] { "abc", "def", "ghi" }, CustomStringComparer.Default);
Console.WriteLine(set.Contains("abc"));
Console.WriteLine(set.Contains("abcde"));
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With