Allocating a large memory block in C++

Tags:

I am trying to allocate a large memory block for a 3D matrix in C++ of floating point value. It's dimensions are 44100x2200x2. This should take exactly 44100x2200x2x4 bytes of memory which is about 7.7gb. I am compiling my code using g++ on a 64bit x86 machine with Ubuntu. When I view the process using htop, I see that the memory usage grows to 32gb and is promptly killed. Did I make a mistake in my memory calculation?

This is my code:

#include <iostream>

using namespace std;
int main(int argc, char* argv[]) {
  int N = 22000;
  int M = 44100;
  float*** a = new float**[N];
  for (int m = 0; m<N; m+=1) {
    cout<<((float)m/(float)N)<<endl;
    a[m] = new float*[M - 1];
    for (int n = 0; n<M - 1; n+=1) {
      a[m][n] = new float[2];
    }
  }
}

EDIT: My calculation was incorrect, and I was allocating closer to 38gb. I fixed the code now to allocate 15gb.

#include <iostream>

using namespace std;
int main(int argc, char* argv[]) {
  unsigned long  N = 22000;
  unsigned long  M = 44100;
  unsigned long blk_dim = N*(M-1)*2;
  float* blk = new float[blk_dim];
  unsigned long b = (unsigned long) blk;

  float*** a = new float**[N];
  for (int m = 0; m<N; m+=1) {
    unsigned long offset1 = m*(M - 1)*2*sizeof(float);
    a[m] = new float*[M - 1];
    for (int n = 0; n<M - 1; n+=1) {
      unsigned long offset2 = n*2*sizeof(float);
      a[m][n] = (float*)(offset1 + offset2 + b);
    }
  }
}

366

asked Aug 29 '18 01:08

War Donkey

2 Answers

You forgot one dimension, and the overhead of allocating memory. The shown code allocates memory very inefficiently in the third dimension, resulting in way too much overhead.

float*** a = new float**[N];

This will allocate, roughly 22000 * sizeof(float **), which is rougly 176kb. Negligible.

a[m] = new float*[M - 1];

A single allocation here will be for 44099 * sizeof(float *), but you will grab 22000 of these. 22000 * 44099 * sizeof(float *), or roughly 7.7gb of additional memory. This is where you stopped counting, but your code isn't done yet. It's got a long ways to go.

a[m][n] = new float[2];

This is a single allocation of 8 bytes, but this allocation will be done 22000 * 44099 times. That's another 7.7gb flushed down the drain. You're now over 15 gigs of application-required memory, roughly, that needs to be allocated.

But each allocation does not come free, and new float[2] requires more than 8 bytes. Each individually allocated block must be tracked internally by your C++ library, so that it can be recycled by delete. The most simplistic link-list based implementation of heap allocation requires one forward pointer, one backward pointer, and the count of how many bytes are there in the allocated block. Assuming nothing needs to be padded for alignment purposes, this is at least 24 bytes of overhead per allocation, on a 64-bit platform.

Now, since your third dimension makes 22000 * 44099 allocations, 22000 allocations for the second dimension, and one allocation for the first dimension: if I count on my fingers, this will require (22000 * 44099 + 22000 + 1) * 24, or another 22 gigabytes of memory, just to consume the overhead of the most simple, basic memory allocation scheme.

We're now up to about 38 gigabytes of RAM needed using the most simple, possible, heap allocation tracking, if I did my math right. Your C++ implementation is likely to use a slightly more sophisticated heap allocation logic, with larger overhead.

Get rid of the new float[2]. Compute your matrix's size, and new a single 7.7gb chunk, then calculate where the rest of your pointers should be pointing to. Also, allocate a single chunk of memory for the second dimension of your matrix, and compute the pointers for the first dimension.

Your allocation code should execute exactly three new statements. One for the first dimension pointer, One for the second dimension pointers. And one more for the huge chunk of data that comprises your third dimension.

146

answered Nov 08 '22 13:11

Sam Varshavchik

Just to round out one answer already given, the example below is basically an extension of the answer given here on how to create a contiguous 2D array, and illustrates the usage of only 3 calls to new[].

The advantage is that you keep the [][][] syntax you would normally use with triple pointers (although I highly advise against writing code using "3 stars" like this, but we have what we have). The disadvantage is that more memory is allocated for the pointers with the addition to the single memory pool for the data.

#include <iostream>
#include <exception>

template <typename T>
T*** create3DArray(unsigned pages, unsigned nrows, unsigned ncols, const T& val = T())
{
    T*** ptr = nullptr;  // allocate pointers to pages
    T** ptrMem = nullptr;
    T* pool = nullptr;
    try 
    {
        ptr = new T**[pages];  // allocate pointers to pages
        ptrMem = new T*[pages * nrows]; // allocate pointers to pool
        pool = new T[nrows*ncols*pages]{ val };  // allocate pool

        // Assign page pointers to point to the pages memory,
        // and pool pointers to point to each row the data pool
        for (unsigned i = 0; i < pages; ++i, ptrMem += nrows)
        {
            ptr[i] = ptrMem;
            for (unsigned j = 0; j < nrows; ++j, pool += ncols)
                ptr[i][j] = pool;
        }
        return ptr;
     }
     catch(std::bad_alloc& ex)
     {
         // rollback the previous allocations
        delete [] ptrMem;
        delete [] ptr;
        throw ex; 
    }
}

template <typename T>
void delete3DArray(T*** arr)
{
    delete[] arr[0][0]; // remove pool
    delete[] arr[0];  // remove the pointers
    delete[] arr;     // remove the pages
}

int main()
{
    double ***dPtr = nullptr;
    try 
    {
        dPtr = create3DArray<double>(4100, 5000, 2);
    }
    catch(std::bad_alloc& )
    {
        std::cout << "Could not allocate memory";
        return -1;
    }
    dPtr[0][0][0] = 10;  // for example
    std::cout << dPtr[0][0][0] << "\n";
    delete3DArray(dPtr);  // free the memory
}

Live Example

answered Nov 08 '22 12:11

PaulMcKenzie

Related questions
                            
                                Can you delete a FILE object created from fopen?
                            
                                Time-complexity of recursive algorithm for calculating binomial coefficient
                            
                                How to handle unique_ptr's with SWIG
                            
                                Undefined symbols for architecture x86_64: for caffe build
                            
                                How to typedef a function pointer with template arguments
                            
                                Is std::unique_ptr an application of RAII?
                            
                                Is there any function equivalent to Matlab's imadjust in OpenCV with C++?
                            
                                Writing a Long from Java in a file and reading it in C++
                            
                                Access variable outside try-catch block
                            
                                Python support sorted dictionary -- similar to C++ map?
                            
                                How do I use Templates to make aliases with `using` (making parameterized aliases) in C++?
                            
                                How to Identify type of a variable
                            
                                Constructor arguments from a tuple
                            
                                Variable Length Arrays in C++14?
                            
                                Foreach Through QJsonObject to get Key/Value Pair
                            
                                what is the result (type) of ternary operation?
                            
                                How to get the HTTP response string using Curl in C++
                            
                                Why "override/final" need to placed after function declarator?
                            
                                visual c++ why does std::move crash
                            
                                Is there any O(n^2) algorithm to generate all sub-sequences of an array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With