Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A faster replacement to the Dictionary<TKey, TValue>

I need a fast replacement for the System.Collections.Generic.Dictionary<TKey, TValue>. My application should be really fast. So, the replacement should support:

  • Generics
  • Add
  • Get
  • Contains

... and that's it. I don't need any support in LINQ or anything. And it should be fast.

A simple code like:

Stopwatch stopWatch = Stopwatch.StartNew();  Dictionary<string, string> dictionary = new Dictionary<string, string>(); dictionary.Add("fieldName", "fieldValue"); dictionary.Add("Title", "fieldVaaaaaaaaaaaaaaaaalue");  Console.WriteLine(stopWatch.Elapsed); 

... prints 00:00:00.0001274, which is alot of time for me, because my application is doing many other things, some of them from old slow libraries that I must to use and are not dependent on me.

Any ideas on how to implement a faster one?

Thank you.

like image 483
Alon Gubkin Avatar asked Dec 08 '09 20:12

Alon Gubkin


People also ask

What is TKey and TValue?

TKey. The type of the keys in the dictionary. TValue. The type of the values in the dictionary. Inheritance.

What is a difference between list T and dictionary TKey TValue >?

The Dictionary maps a key to a value and cannot have duplicate keys, whereas a list just contains a collection of values.

Which is faster dictionary or list?

A dictionary is 6.6 times faster than a list when we lookup in 100 items.

Why dictionary is fast in C#?

A dictionary uses a hash table, it is a great data structure as it maps an input to a corresponding output almost instantaneously, it has a complexity of O(1) as already pointed out which means more or less immediate retrieval.


2 Answers

Chances are you're seeing JIT compilation. On my box, I see:

00:00:00.0000360 00:00:00.0000060 

when I run it twice in quick succession within the same process - and not in the debugger. (Make sure you're not running it in the debugger, or it's a pointless test.)

Now, measuring any time that tiny is generally a bad idea. You'd need to iterate millions of times to get a better idea of how long it's taking.

Do you have good reason to believe it's actually slowing down your code - or are you basing it all on your original timing?

I doubt that you'll find anything significantly faster than Dictionary<TKey, TValue> and I'd be very surprised to find that it's the bottleneck.

EDIT: I've just benchmarked adding a million elements to a Dictionary<TKey, TValue> where all the keys were existing objects (strings in an array), reusing the same value (as it's irrelevant) and specifying a capacity of a million on construction - and it took about 0.15s on my two-year-old laptop.

Is that really likely to be a bottleneck for you, given that you've already said you're using some "old slow libraries" elsewhere in your app? Bear in mind that the slower those other libraries are, the less impact an improved collection class will have. If the dictionary changes are only accounting for 1% of your overall application time, then even if we could provide an instantaneous dictionary, you'd only speed up your app by 1%.

As ever, get a profiler - it'll give you a much better idea of where your time is going.

like image 92
Jon Skeet Avatar answered Oct 02 '22 14:10

Jon Skeet


I agree with Jon Skeet's supposition that this is most likely JIT compilation.

That being said, I wanted to add some other information here:

Most of the speed issues relating to using Dictionary<T,U> are not related to the implementation of Dictionary. Dictionary<T,U> is VERY fast, out of the box. It would be difficult to beat it.

Speed issues relating to Dictionary instances are almost always actually hash code implementation issues. If you're having speed issues when using Dictionary<MyCustomClass,MyValue>, revisit the GetHashCode() implementation you have defined on MyCustomClass. This is even more critical if you're using a custom struct as your key.

In order to get good performance out of Dictionary, GetHashCode() should be:

  1. Fast
  2. Able to provide hash codes that generate few conflicts. Unique instances should, when possible, generate unique hash values.

If you get that right, I think you'll be very happy with the default Dictionary implementation.

like image 38
Reed Copsey Avatar answered Oct 02 '22 15:10

Reed Copsey