Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.net dictionary vs other managed custom data structures, why is the .net dictionary so fast? [duplicate]

I am in the middle of developing a custom persistent Key Value type data structure, to compare against SqlLite and Berkley DB. Anyway before I wrote the implementation I wanted to find the best data structure to use for this purposes. I looked at the a couple:

  • An open source redblack tree.
  • Mono Dictionary implementation.

I wanted the datastructures I picked to have performance numbers comparable to the .net dictionary.

I used a simple test for loop with 500k iterations for inserts and used the stopwatch to measure inserts and key look up:

I notice that

  • Berkley DB key lookup time was about the same as the Dictionary.
  • I tried my for loop test for C5 the dictionary, a redblack tree implementation and even mono's dictionary implementation.

Insert time: 7% slower than the .net dictionary.
Lookup time: 1000% slower than the .net dictionary. This is even slower than the look up speed with sqllite!! I attempted to perform the test with compiler optimization turned on and still got similar results.

I realize I am comparing Hashtables vs trees etc, but I stumped as to the performance discrepancy between all the data structures.

Anybody have any ideas

like image 872
Inuka G Avatar asked Feb 16 '10 23:02

Inuka G


1 Answers

Two thoughts:

  1. You should make sure you are not inadvertently including JIT time in your tests - this can add a considerable amount of time to the result. You should perform two runs in the same execution and discard the first run.

  2. You should make sure that you are not running under the debugger - this can dramatically skew performance results.

Aside form that, any performance differences you see may very well be the result of the difference in performance between a hash table and a tree. A tree structure typically has O(n*log(n)) performance on average for a lookup. A balanced tree can reduce that to O(lon(n)). Hashtables, meanwhile, can approach O(1) time for lookups when hash collisions are avoided.

I would also imagine that the .NET Dictionary class is highly optimized since it is a bread-and-butter data structure for so many different things in .NET. Also, a generic Dictionary<> may be able to avoid boxing, and therefore you may see some performance differences from that.

like image 70
LBushkin Avatar answered Sep 30 '22 01:09

LBushkin