Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good C++ array class for dealing with large arrays of data in a fast and memory efficient way?

Following on from a previous question relating to heap usage restrictions, I'm looking for a good standard C++ class for dealing with big arrays of data in a way that is both memory efficient and speed efficient. I had been allocating the array using a single malloc/HealAlloc but after multiple trys using various calls, keep falling foul of heap fragmentation. So the conclusion I've come to, other than porting to 64 bit, is to use a mechanism that allows me to have a large array spanning multiple smaller memory fragments. I don't want an alloc per element as that is very memory inefficient, so the plan is to write a class that overrides the [] operator and select an appropriate element based on the index. Is there already a decent class out there to do this, or am I better off rolling my own?

From my understanding, and some googling, a 32 bit Windows process should theoretically be able address up to 2GB. Now assuming I've 2GB installed, and various other processes and services are hogging about 400MB, how much usable memory do you think my program can reasonably expect to get from the heap?

I'm currently using various flavours of Visual C++.

Edit As per Poita's post, I've tried a std::deque, using the following test on VS2008;

#include <deque>
using namespace std;
struct V    
{
    double  data[11];
};

struct T
{
    long    data[8];    
};


void    dequeTest()
{
    deque<V> VQ;
    deque<T> TQ;

    V defV;
    T defT;

    VQ.resize(4000000,defV);
    TQ.resize(8000000,defT);
}

The total memory for the above data comes out at 608MB, were I to use straight malloc or HeapAlloc, and takes < 1 second. The deque resizes took 950MB originally, and then slowly started dropping back. 15 minutes later, dequeTest() finished, using just 6MB of memory showing for the process which probably was more to do with the run-times. I also tried populating the deque using various push options, but performance was so bad, I had to break out early. I could possibly provide a better allocator than the defualt to get a much better response, but on the face of it deque is not the class for this job. Note this could also relate to the MS VS2008 implementation of deque, as there seems to be alot in this class that is very implementation dependant when it comes to performance.

Time to write my own big array class, I reckon.

Second Edit: Allocating smaller amounts yielded 1.875GB immediately using the following;

#define TenMB 1024*1024*10

void    SmallerAllocs()
{

    size_t Total = 0;
    LPVOID  p[200];
    for (int i = 0; i < 200; i++)
    {
        p[i] = malloc(TenMB);
        if (p[i])
            Total += TenMB; else
            break;
    }
    CString Msg;
    Msg.Format("Allocated %0.3lfGB",Total/(1024.0*1024.0*1024.0));
    AfxMessageBox(Msg,MB_OK);
}

Final edit I have decided to accept Poita's post and the various comments following it, not because I'll be using the deque class directly, but more for the array as a deck of cards notion in the comments that followed. This should be straightforward to implement with O(1) random element access, based on a fixed number of elements per block, which is what i need. Thanks to all for the feedback!

like image 372
SmacL Avatar asked Mar 18 '10 19:03

SmacL


People also ask

Which search is suitable for large array?

If we want to search the element, which is the last element of the array, a linear search will start searching from the first element and goes on till the last element, so the time taken to search the element would be large. On the other hand, binary search is suitable for a large data set as it takes less time.

How is array stored in memory?

When we declare an array, space is reserved in the memory of the computer for the array. The elements of the array are stored in these memory locations. The important thing about arrays is that array elements are always stored in consecutive memory locations.

What is array and types of array in data structure?

What Are Arrays in Data Structures? An array is a linear data structure that collects elements of the same data type and stores them in contiguous and adjacent memory locations. Arrays work on an index system starting from 0 to (n-1), where n is the size of the array.

How do you assign the size of an array in C++?

In a C++ array declaration, the array size is specified after the variable name, not after the type name as in some other languages. The following example declares an array of 1000 doubles to be allocated on the stack. The number of elements must be supplied as an integer literal or else as a constant expression.


2 Answers

Have you tried using an std::deque? Unlike a std::vector, which uses one huge heap allocation, deque usually allocates in small chunks, but still provides amortised constant time indexing via operator[].

like image 52
Peter Alexander Avatar answered Oct 25 '22 11:10

Peter Alexander


Exactly how sparse is this array? If there are large amounts of empty (unused) space in it, you may want to take another approach. The answer to this question suggests an stl map.

If it isn't sparse (as mentioned in the comments), one thing you might look into since you are running on Windows is using a memory-mapped file. Although your OS may be 32-bit, your file system is not. This does of course mean there will be swapping going on though, which is liable to be quite a bit slower than if you could really put the whole darn thing in RAM.

Also, you really ought to consider knocking the system's RAM up to the max (3GB on 32-bit Windows I believe) to see if that fixes it for you. That should only cost you about $100, and you are spending way more than that in man-hours just worrying about this.

like image 29
T.E.D. Avatar answered Oct 25 '22 11:10

T.E.D.