Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large Object Heap friendly IDictionary

We have an application that holds large numbers of objects in several Dictionarys, some of which grow continually during the lifetime of the app (trading application with lots of instruments and continually growing orders/trades).

We are having problems with OutOfMemoryExceptions due to fragmentation of the large object heap.

To counter this I've tried to write a 'large' dictionary that is implemented as a two level dictionary where all the leaf dictionaries are not large enough to be allocated on the LOH. I've used a consistent hashing algorithm to avoid having to rehash the entire dictionary when a single bucket becomes too large. The consistent hashing 'circle' is a TreeDictionary from the C5 collections library.

My question is, are there any better data structures (or perhaps better implementations of the one I described) for C#?

Update

This is the implementation for the 'large' dictionary: https://gist.github.com/956621

I understand it's not foolproof as neither the LOH heap threshold is in the specification, nor the size of each Dictionary entry or scaling algorithm. However, this is currently the best I can think of to avoid the application blowing up mid-day.

like image 259
SimonC Avatar asked May 05 '11 04:05

SimonC


1 Answers

A dictionary is an unfortunate data structure when it is the largest one in your application. The hashtable is often doubled in size when it becomes too full and that requires 150% overallocation during the resizing, right at the critical time. The hashtable works superbly enough when it's gigantic but it requires consecutive allocation which stresses heap algorithms.

You can diminish these disadvantages with multi-level hashtables, for example using a byte of the hashcode as an index into 256 hashtables. This adds some overhead for sure but more importantly this and other strategies are filled with peril by fiddling with the randomness such as it of the hashcodes that you get and potentially making things much, much worse performance-wise. Using this approach requires a good theoretical foundation and solid empirical testing. But it can work.

Another strategy is to pre-allocate the biggest data-structure for the worst case and allocate it early. No fine-grained allocation necessary but now you face the spectre of catastrophic failure if it should ever run out. It's an option.

like image 132
Rick Sladkey Avatar answered Sep 30 '22 16:09

Rick Sladkey