I am trying to allocate a large memory block for a 3D matrix in C++ of floating point value. It's dimensions are 44100x2200x2. This should take exactly 44100x2200x2x4 bytes of memory which is about 7.7gb. I am compiling my code using g++ on a 64bit x86 machine with Ubuntu. When I view the process using htop, I see that the memory usage grows to 32gb and is promptly killed. Did I make a mistake in my memory calculation?
This is my code:
#include <iostream>
using namespace std;
int main(int argc, char* argv[]) {
int N = 22000;
int M = 44100;
float*** a = new float**[N];
for (int m = 0; m<N; m+=1) {
cout<<((float)m/(float)N)<<endl;
a[m] = new float*[M - 1];
for (int n = 0; n<M - 1; n+=1) {
a[m][n] = new float[2];
}
}
}
EDIT: My calculation was incorrect, and I was allocating closer to 38gb. I fixed the code now to allocate 15gb.
#include <iostream>
using namespace std;
int main(int argc, char* argv[]) {
unsigned long N = 22000;
unsigned long M = 44100;
unsigned long blk_dim = N*(M-1)*2;
float* blk = new float[blk_dim];
unsigned long b = (unsigned long) blk;
float*** a = new float**[N];
for (int m = 0; m<N; m+=1) {
unsigned long offset1 = m*(M - 1)*2*sizeof(float);
a[m] = new float*[M - 1];
for (int n = 0; n<M - 1; n+=1) {
unsigned long offset2 = n*2*sizeof(float);
a[m][n] = (float*)(offset1 + offset2 + b);
}
}
}
Syntax: ptr = (cast-type*) malloc(byte-size) For Example: ptr = (int*) malloc(100 * sizeof(int)); Since the size of int is 4 bytes, this statement will allocate 400 bytes of memory.
In C, the library function malloc is used to allocate a block of memory on the heap. The program accesses this block of memory via a pointer that malloc returns. When the memory is no longer needed, the pointer is passed to free which deallocates the memory so that it can be used for other purposes.
Use the malloc() function to allocate memory in designated blocks and the new function to create memory in the free store (heap). To reallocate memory, the realloc() function is used. When finished, always include a free() function in order to free up the memory. If you used new(), use delete() to free up the memory.
To solve this issue, you can allocate memory manually during run-time. This is known as dynamic memory allocation in C programming. To allocate memory dynamically, library functions are malloc() , calloc() , realloc() and free() are used. These functions are defined in the <stdlib.h> header file.
How fast can you allocate a large block of memory in C++? In C++, the most basic memory allocation code is just a call to the new operator: According to a textbook interpretation, we just allocated s bytes 1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s.
How does Memory Allocation work in C? In C language, static and dynamic memory allocation is also known as stack memory and heap memory which are allocated during compile time and run time, respectively. 1. Static Memory Allocation
A block of memory may be allocated using the function malloc in c. The malloc functionreserves a block of memory of specified size and returns a pointer of type void. This means that we can assign it to any type of pointer. It takes the following form: ptr = (cast-type *) malloc(byte-size); ptr is a pointer of type cast-type.
In C language, static and dynamic memory allocation is also known as stack memory and heap memory which are allocated during compile time and run time, respectively. 1. Static Memory Allocation As we discussed static memory allocation is the allocation of memory for the data variables when the computer programs start.
You forgot one dimension, and the overhead of allocating memory. The shown code allocates memory very inefficiently in the third dimension, resulting in way too much overhead.
float*** a = new float**[N];
This will allocate, roughly 22000 * sizeof(float **)
, which is rougly 176kb. Negligible.
a[m] = new float*[M - 1];
A single allocation here will be for 44099 * sizeof(float *)
, but you will grab 22000 of these. 22000 * 44099 * sizeof(float *)
, or roughly 7.7gb of additional memory. This is where you stopped counting, but your code isn't done yet. It's got a long ways to go.
a[m][n] = new float[2];
This is a single allocation of 8 bytes, but this allocation will be done 22000 * 44099 times. That's another 7.7gb flushed down the drain. You're now over 15 gigs of application-required memory, roughly, that needs to be allocated.
But each allocation does not come free, and new float[2]
requires more than 8 bytes. Each individually allocated block must be tracked internally by your C++ library, so that it can be recycled by delete
. The most simplistic link-list based implementation of heap allocation requires one forward pointer, one backward pointer, and the count of how many bytes are there in the allocated block. Assuming nothing needs to be padded for alignment purposes, this is at least 24 bytes of overhead per allocation, on a 64-bit platform.
Now, since your third dimension makes 22000 * 44099 allocations, 22000 allocations for the second dimension, and one allocation for the first dimension: if I count on my fingers, this will require (22000 * 44099 + 22000 + 1) * 24, or another 22 gigabytes of memory, just to consume the overhead of the most simple, basic memory allocation scheme.
We're now up to about 38 gigabytes of RAM needed using the most simple, possible, heap allocation tracking, if I did my math right. Your C++ implementation is likely to use a slightly more sophisticated heap allocation logic, with larger overhead.
Get rid of the new float[2]
. Compute your matrix's size, and new
a single 7.7gb chunk, then calculate where the rest of your pointers should be pointing to. Also, allocate a single chunk of memory for the second dimension of your matrix, and compute the pointers for the first dimension.
Your allocation code should execute exactly three new
statements. One for the first dimension pointer, One for the second dimension pointers. And one more for the huge chunk of data that comprises your third dimension.
Just to round out one answer already given, the example below is basically an extension of the answer given here on how to create a contiguous 2D array, and illustrates the usage of only 3 calls to new[]
.
The advantage is that you keep the [][][]
syntax you would normally use with triple pointers (although I highly advise against writing code using "3 stars" like this, but we have what we have). The disadvantage is that more memory is allocated for the pointers with the addition to the single memory pool for the data.
#include <iostream>
#include <exception>
template <typename T>
T*** create3DArray(unsigned pages, unsigned nrows, unsigned ncols, const T& val = T())
{
T*** ptr = nullptr; // allocate pointers to pages
T** ptrMem = nullptr;
T* pool = nullptr;
try
{
ptr = new T**[pages]; // allocate pointers to pages
ptrMem = new T*[pages * nrows]; // allocate pointers to pool
pool = new T[nrows*ncols*pages]{ val }; // allocate pool
// Assign page pointers to point to the pages memory,
// and pool pointers to point to each row the data pool
for (unsigned i = 0; i < pages; ++i, ptrMem += nrows)
{
ptr[i] = ptrMem;
for (unsigned j = 0; j < nrows; ++j, pool += ncols)
ptr[i][j] = pool;
}
return ptr;
}
catch(std::bad_alloc& ex)
{
// rollback the previous allocations
delete [] ptrMem;
delete [] ptr;
throw ex;
}
}
template <typename T>
void delete3DArray(T*** arr)
{
delete[] arr[0][0]; // remove pool
delete[] arr[0]; // remove the pointers
delete[] arr; // remove the pages
}
int main()
{
double ***dPtr = nullptr;
try
{
dPtr = create3DArray<double>(4100, 5000, 2);
}
catch(std::bad_alloc& )
{
std::cout << "Could not allocate memory";
return -1;
}
dPtr[0][0][0] = 10; // for example
std::cout << dPtr[0][0][0] << "\n";
delete3DArray(dPtr); // free the memory
}
Live Example
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With