I enjoy developing algorithms using the STL, however, I have this recurring problem where my data sets are too large for the heap.
I have been searching for drop-in replacements for STL containers and algorithms which are disk-backed, i.e. the data structures on stored on disk rather than the heap.
A friend recently pointed me towards stxxl. Before I get too involved with it... Are any other disk-backed STL replacements available that I should be considering?
NOTE: I'm not interested in persistence or embedded databases. Please don't mention boost::serialization, POST++, Relational Template Library, Berkeley DB, sqlite, etc. I am aware of these projects and use them when they are appropriate for my purposes.
UPDATE: Several people have mentioned memory-mapping files and using a custom allocator, good suggestions BTW, but I would point them to the discussion here where David Abraham suggests that custom iterators would be needed for disk-backed containers. Meaning the custom allocator approach isn't likely to work.
I have implemented some thing very similar. Implementing the iterators is the most challenging. I used boost::iterator_facade to implement the iterators. Using boost::iterator_facade
you can easy adapt any cached on disk data structures to have a STL container interface.
I've never had to do anything quite like this, but It might be possible to do what you want to do by writing a custom allocator that makes use of a memory mapped files to back your data.
See boost::interprocesses for docs on their easy to use implementation of memory mapped files, this Dr. Dobbs article for a detailed discussion on writing allocators, and this IEEE Software column for a description of the problem and example code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With