Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with large amounts of data in c++

I have an application that sometimes will utilize a large amount of data. The user has the option to load in a number of files which are used in a graphical display. If the user selects more data than the OS can handle, the application crashes pretty hard. On my test system, that number is about the 2 gigs of physical RAM.

What is a good way to handle this situation? I get the "bad alloc" thrown from new and tried trapping that but I still run into a crash. I feel as if I'm treading in nasty waters loading this much data but it is a requirement of this application to handle this sort of large data load.

Edit: I'm testing under a 32 bit Windows system for now but the application will run on various flavors of Windows, Sun and Linux, mostly 64 bit but some 32.

The error handling is not strong: It simply wraps the main instantiation code with a try catch block, the catch looking for any exception per another peer's complaint of not being able to trap the bad_alloc everytime.

I think you guys are right, I need a memory management system that doesn't load all of this data into the RAM, it just seems like it.

Edit2: Luther said it best. Thanks guy. For now, I just need a way to prevent a crash which with proper exception handling should be possible. But down the road I'll be implementing that acception solution.

like image 915
Robb Avatar asked Aug 16 '10 14:08

Robb


People also ask

How to store large data in C?

There are two ways of doing this, (i) allocating a single piece of n×m memory on the heap, OR (ii) allocate an array of arrays – allocate one array of n rows, and to each row allocate m columns. First create a pointer to an int, and then allocate memory to it: int *image; image = (int *)malloc(sizeof(int)*n*m);

How does C++ handle large data?

Try to use variables which can handle large data and optimize your code so as to obtain the answer in the stipulated time. Avoid using too many nested loops and function calls, if at all functions are needed declare them to be inline. Use fast read/write to read/write large data. Try obtaining the solution in least no.


2 Answers

There is the STXXL library which offers STL like containers for large Datasets.

  • http://stxxl.sourceforge.net/

Change "large" into "huge". It is designed and optimized for multicore processing of data sets that fit on terabyte-disks only. This might suffice for your problem, or the implementation could be a good starting point to tailor your own solution.


It is hard to say anything about your application crashing, because there are numerous hiccups involved when it comes to tight memory conditions: You could hit a hard address space limit (for example by default 32-bit Windows only has 2GB address space per user process, this can be changed, http://www.fmepedia.com/index.php/Category:Windows_3GB_Switch_FAQ ), or be eaten alive by the OOM killer ( Not a mythical beast:, see http://lwn.net/Articles/104179/ ).

What I'd suggest in any case to think about a way to keep the data on disk and treat the main memory as a kind of Level-4 cache for the data. For example if you have, say, blobs of data, then wrap these in a class which can transparently load the blobs from disk when they are needed and registers to some kind of memory manager which can ask some of the blob-holders to free up their memory before the memory conditions become unbearable. A buffer cache thus.

like image 199
Nordic Mainframe Avatar answered Sep 29 '22 19:09

Nordic Mainframe


The user has the option to load in a number of files which are used in a graphical display.

Usual trick is not to load the data into memory directly, but rather use the memory mapping mechanism to make the files look like memory.

You need to make sure that the memory mapping is done in read-only mode to allow the OS to evict it from RAM if it is needed for something else.

If the user selects more data than the OS can handle, the application crashes pretty hard.

Depending on OS it is either: application is missing some memory allocation error handling or you really getting to the limit of available virtual memory.

Some OSs also have an administrative limit on how large the heap of application can grow.

On my test system, that number is about the 2 gigs of physical RAM.

It sounds like:

  • your application is 32-bits and
  • your OS uses the 2GB/2GB virtual memory split.

To avoid hitting the limit, your need to:

  • upgrade your app and OS to 64-bit or
  • tell OS (IIRC patch for Windows; most Linuxes already have it) to use 3GB/1GB virtual memory split. Some 32-bit OSs are using 2GB/2GB memory split: 2GB of virtual memory for kernel and 2 for the user application. 3/1 split means 1GB of VM for kernel, 3 for the user application.
like image 41
Dummy00001 Avatar answered Sep 29 '22 19:09

Dummy00001