Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A dictionary that can save its elements accessed less often to a disk

In my application I use a dictionary (supporting adding, removing, updating and lookup) where both keys and values are or can be made serializable (values can possibly be quite large object graphs). I came to a point when the dictionary became so large that holding it completely in memory started to occasionally trigger OutOfMemoryException (sometimes in the dictionary methods, and sometimes in other parts of code).

After an attempt to completely replace the dictionary with a database, performance dropped down to an unacceptable level.

Analysis of the dictionary usage patterns showed that usually a smaller part of values are "hot" (are accessed quite often), and the rest (a larger part) are "cold" (accessed rarely or never). It is difficult to say when a new value is added if it will be hot or cold, moreover, some values may migrate back and forth between hot and cold parts over time.

I think that I need an implementation of a dictionary that is able to flush its cold values to a disk on a low memory event, and then reload some of them on demand and keep them in memory until the next low memory event occurs when their hot/cold status will be re-assessed. Ideally, the implementation should neatly adjust the sizes of its hot and cold parts and the flush interval depending on the memory usage profile in the application to maximize overall performance. Because several instances of a dictionary exist in the application (with different key/value types), I think, they might need to coordinate their workflows.

Could you please suggest how to implement such a dictionary?

like image 399
Liu Jin Tsai Avatar asked Jul 21 '13 19:07

Liu Jin Tsai


1 Answers

Compile for 64 bit, deploy on 64 bit, add memory. Keep it in memory.

Before you grown your own you may alternatively look at WeakReference http://msdn.microsoft.com/en-us/library/ms404247.aspx. It would of course require you to rebuild those objects that were reclaimed but one should hope that those which are reclaimed are not used much. It comes with the caveat that its own guidleines state to avoid using weak references as an automatic solution to memory management problems. Instead, develop an effective caching policy for handling your application's objects.

Of course you can ignore that guideline and effectively work your code to account for it.

You can implement the caching policy and upon expiry save to database, on fetch get and cache. Use a sliding expiry of course since you are concerned with keeping those most used.

Do remember however that most used vs heaviest is a trade off. Losing an object 10 times a day that takes 5 minutes to restore would annoy users much more than losing an object 10000 times which tool just 5ms to restore.

And someone above mentioned the web cache. It does automatic memory management with callbacks as noted, depends if you want to lug that one around in your apps.

And...last but not least, look at a distributed cache. With sharding you can split that big dictionary across a few machines.

like image 88
cineam mispelt Avatar answered Sep 20 '22 17:09

cineam mispelt