Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory-mapped file IList implementation, for storing large datasets "in memory"?

I need to perform operations chronologically on huge time series implemented as IList. The data is ultimately stored into a database, but it would not make sense to submit tens of millions of queries to the database.

Currently the in-memory IList triggers an OutOfMemory exception when trying to store more than 8 million (small) objects, though I would need to deal with tens of millions.

After some research, it looks like the best way to do it would be to store data on disk and access it through an IList wrapper.

Memory-mapped files (introduced in .NET 4.0) seem the right interface to use, but I wonder what is the best way to write a class that should implement IList (for easy access) and internally deal with a memory-mapped file.

I am also curious to hear if you know about other ways ! I thought for example of an IList wrapper using data from db4o (someone mentionned here using a memory-mapped file as the IoAdapterFile, though using db4o probably adds a performance cost vs. dealing directly with the memory-mapped file).

I have come across this question asked in 2009, but it did not yield useful answers or serious ideas.

like image 750
Erwin Mayer Avatar asked Sep 14 '11 19:09

Erwin Mayer


2 Answers

I found this PersistentDictionary<>, but it only works with strings, and by reading the source code I am not sure it was designed for very large datasets.

More scalable (up to 16 TB), the ESENT PersistentDictionary<>, uses the ESENT database engine present in Windows (XP+) and can store all serializable objects containing simple types.

Disk Based Data Structures, including Dictionary, List and Array with an "intelligent" serializer looked exactly like what I was looking for, but it did not run smoothly with extremely large datasets, especially as it does not make use of the "native" .NET MemoryMappedFiles yet, and support for 32 bits systems is experimental.

Update 1: I ended up implementing my own version that makes extensive use of .NET MemoryMappedFiles; it is very fast and I will probably release it on Codeplex once I have made it better for more general purpose usages.

Update 2: TeaFiles.Net also worked great for my purpose. Highly recommended (and free).

like image 65
Erwin Mayer Avatar answered Oct 12 '22 23:10

Erwin Mayer


I see several options:

  • "in-memory-DB"
    for example SQLite can be used this way - no need for any setup etc. just deploying the DLL (1 or 2) together with the app and the rest can be done programmatically
  • Load all data into temporary table(s) into the DB, with unknown (but big) amounts of data I found that this pays off really fast (and processing can usually be done inside the DB whcih is even better!)
  • use a MemoryMappedFile and a fixed structure size (array-like access via offset) but beware that physical memory is the limit except you use some sort of "sliding window" to map only parts into memory
like image 30
Yahia Avatar answered Oct 12 '22 23:10

Yahia