Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacement .net Dictionary

Given (Simplified description)

One of our services has a lot of instances in memory. About 85% are unique. We need a very fast key based access to these items as they are queried very often in a single stack / call. This single context is extremely performance optimized.

So we started to put them them into a dictionary. The performance was ok.

Access to the items as fast as possible is the most important thing in this case. It is ensured that there are no write operations when reads occur.

Problem

In the meanwhile we hit the limits of the number of items a dictionary can store.

Die Arraydimensionen haben den unterstützten Bereich überschritten. 
  bei System.Collections.Generic.Dictionary`2.Resize(Int32 newSize, Boolean forceNewHashCodes)
  bei System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)

Which translates to The array dimensions have exceeded the supported range.

Solutions like Memcached are in this specific case just too slow. It is a isolated very specific use case encapsulated in a single service

So we are looking for a replacement of the dictionary for this specific scenario.

Currently I can't find one supporting this. Am I missing something? Can someone point me to one?

As an alternative, if none exists we are thinking about implementing one by ourselves.

We thought about two possibilities. Build it up from scratch or wrapping multiple dictionaries.

Wrapping multiple dictionaries

When an item is searched we could have a look at the keys HasCode and use its starting number like an index for a list of wrappers dictionaries. Although this seems to be easy it smells to me and it would mean that the hashcode is calculated twice (one time by us one time by the inner dictionary) (this scenario is really really performance cruical).

I know that exchanging a basetype like the dictionary is the absolute last possibility and I want to avoid it. But currently it looks like there is no way to make the objects more unique or to get the performance of a dictionary from a database or to save performance somewhere else.

I'm also aware of "be aware of optimizations" but the a lower performance would very badly hit the business requirements behind it.

like image 273
Boas Enkler Avatar asked Feb 25 '16 08:02

Boas Enkler


People also ask

What can I use instead of Dictionary in C#?

A HashSet, similar to a Dictionary, is a hash-based collection, so look ups are very fast with O(1). But unlike a dictionary, it doesn't store key/value pairs; it only stores values. So, every objects should be unique and this is determined by the value returned from the GetHashCode method.

How to replace the value For a key in Dictionary in c#?

If you use ContainsKey to check the existance and update the value using dic[key] = val + newValue; then you are accessing the dictionary twice. Instead of dic. Add(key, newValue); you can use use dic[key] = newvalue; .

Does C# have a Dictionary?

Dictionary is a collection of keys and values in C#. Dictionary is included in the System.

Is C# Dictionary fast?

The Dictionary generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.


2 Answers

Before I finished reading your questions, the simple multiple dictionaries came to my mind. But you know this solution already. I am assuming you are really hitting the maximum number of items in a dictionary, not any other limit.

I would say go for it. I do not think you should be worried about counting a hash twice. If they keys are somehow long and getting the hash is really a time consuming operations (which I doubt, but can't be sure as you did not mention what are the keys), you do not need to use whole keys for your hash function. Just pick up whatever part you are OK to process in your own hashing and distribute the item based on that.

The only thing you need to make sure here is to have an evenly spread of items among your multiple dictionaries. How hard is to achieve this really depends on what your keys are. If they were completely random numbers, you could just use the first byte and it would be fine (unless you would need more than 256 dictionaries). If they are not random numbers, you have to think about the distribution in their domain and code your first hash function in a way it achieves that goal of even distribution.

like image 56
Wapac Avatar answered Sep 27 '22 22:09

Wapac


I've looked at the implementation of the .Net Dictionary and it seems like you should be able to store 2^32 values in your dictionary. (Next to the list of buckets, which are themselves linked lists there is a single array that stores all items, probably for quick iteration, that might be the limiting factor).

If you haven't added 2^32 values it might be that there is a limit on the items in a bucket (its a linked list so its probably limitted to the maximum stackframe size). In that case you should double check that your hashing function spreads the items evenly over the dictionary. See this answer for more info What is the best algorithm for an overridden System.Object.GetHashCode?

like image 27
Roy T. Avatar answered Sep 27 '22 23:09

Roy T.