Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random-access container that does not fit in memory?

I have an array of objects (say, images), which is too large to fit into memory (e.g. 40GB). But my code needs to be able to randomly access these objects at runtime.

What is the best way to do this?

From my code's point of view, it shouldn't matter, of course, if some of the data is on disk or temporarily stored in memory; it should have transparent access:

container.getObject(1242)->process();
container.getObject(479431)->process();

But how should I implement this container? Should it just send the requests to a database? If so, which one would be the best option? (If a database, then it should be free and not too much administration hassle, maybe Berkeley DB or sqlite?)

Should I just implement it myself, memoizing objects after acces sand purging the memory when it's full? Or are there good libraries (C++) for this out there?

The requirements for the container would be that it minimizes disk access (some elements might be accessed more frequently by my code, so they should be kept in memory) and allows fast access.

UPDATE: I turns out that STXXL does not work for my problem because the objects I store in the container have dynamic size, i.e. my code may update them (increasing or decreasing the size of some objects) at runtime. But STXXL cannot handle that:

STXXL containers assume that the data types they store are plain old data types (POD). http://algo2.iti.kit.edu/dementiev/stxxl/report/node8.html

Could you please comment on other solutions? What about using a database? And which one?

like image 627
Frank Avatar asked Jan 25 '10 19:01

Frank


People also ask

What is a random access container?

A Random Access Container is a Reversible Container whose iterator type is a Random Access Iterator. It provides amortized constant time access to arbitrary elements.

What are the STL containers?

An STL container is a collection of objects of the same type (the elements). Container owns the elements. Creation and destruction is controlled by the container.

What is random access in database?

Random access (more precisely and more generally called direct access) is the ability to access an arbitrary element of a sequence in equal time or any datum from a population of addressable elements roughly as easily and efficiently as any other, no matter how many elements may be in the set.

What is the fastest container in C++?

Overall, for insertions, the vector and deque are the fastest for small types and the list is the fastest for the very large types.


2 Answers

Consider using the STXXL:

The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i.e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the compatibility to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.

like image 143
James McNellis Avatar answered Nov 15 '22 07:11

James McNellis


You could look into memory mapped files, and then access one of those too.

like image 44
Liz Albin Avatar answered Nov 15 '22 08:11

Liz Albin