Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In C++, what is the difference between new and new[] for array allocations

I am aware of the differences between free and delete in C++. But one thing I never understood is why in C malloc/free can allocate de-allocate both single 'objects' and arrays but in C++ we need to use the correct new/delete vs new[]/delete[] pair.

Searching on Stackoverflow, it seems that in C++, new[] allocates extra memory to hold the size of the allocated array and new only allocates the memory to the object itself. And because of that, you should be aware of this extra overhead.

If the previous paragraph is indeed the case, then how malloc/free handles this overhead? Or they just accept this overhead? And if it is tolerable in C, why not in C++?

On the other hand, in case it's not because of memory overhead, but because of calling constructors and destructors, couldn't the compiler be smart enough to generate the appropriate code under the hood and let the programmer just write new/delete for both single objects and arrays of objects?

I am writing a compiler for a toy language whose semantics is similar to C++ and it seems that it is possible to let the compiler decides how to allocate and de-allocate only using new/delete, but as C++ uses new/delete and new[]/delete[], maybe there's a catch that I am not seeing right now. Maybe something related to polymorphism and virtual tables?

If you're curious, my naive idea is to simple allocate an integer together with the object/array where this integer is the size of the array or simple 1 in case of being an object. Then, when calling delete, it checks the value of the integer, if it is 1, it calls the destructor. If it greater than 1, then it iterates the array calling the destructor to each object in the array. As I said, it seems to work and would let the programmer just write new/delete instead of new/delete vs new[]/delete. But then again, maybe there's a catch that I am not seeing.

Edited part:

After some answers, I decided to try to provide some pseudo-code and a better background.

In C language, memory allocations are usually made with malloc() and de-allocations with free(). Either if you are allocating a single POD, a single struct or an array, malloc() fits all these cases. There is no need for different versions of malloc() if you are allocating a single struct vs a malloc_array() version if you are allocating an array. At least at the public API level. In other words, it seems it doesn't matter if you are allocating a few bytes or many bytes, there will be no overhead for bookkeeping the allocation size information.

As many of you are aware, including myself, new and delete do more than just allocate and de-allocate memory. New allocate memory and call the constructor and delete calls the destructor and then de-allocate memory. But in C++, you need to be aware if you are allocating just a single object or an array of objects. In case you are allocating an array, you need to use the new[]/delete[] pair.

In C, if you implement a binary tree, nodes will be allocated with malloc and de-allocated with free and in C++ with new and delete. But if you are implementing something like the vector class in C++, in C you still would use malloc/free, but now in C++ you would need to use new[]/delete[] (considering a sane implementation without too much black magic).

Consider the following pseudo-code that is executed by the compiler. In this pseudo-code, the delete function somehow gets access to the malloc internals and knows how many bytes there are, which in turn can be easily used to calculate how many objects there are. As this delete implementation is using malloc internals to know how much memory is allocated, in theory there should be no overhead of bookkeeping.

// ClassType is a meta type only know by the compiler
// it stores a class info such as name, size, constructors and so on
void *new(ClassType c) {
    // allocates memory with malloc. Malloc() do the storage bookkeeping
    // note that somehow malloc is allocating just a single object
    c *ptr = malloc(sizeof(c));

    // now, call the constructor of the requested class
    c.constructor(ptr);

    // return the new object
    return ptr;
}

void *new(ClassType c, size_t n) {
    c *ptr = malloc(sizeof(c) * n);

    // iterate over the array and construct each object
    for (i = 0; i < n; ++i) {
        c.constructor(ptr[i]);
    }

    return ptr;
}

// this delete version seems to be able to de-allocate both single
// objects and arrays of objects with no overhead of bookkeeping because
// the bookkeeping is made by malloc/free. So I would need 
// just a new/delete pair instead of new/delete vs new[]/delete[]
// Why C++ doesn't use something like my proposed implementation? 
// What low-level details prohibits this implementation from working?
void delete(ClassType c, void *ptr) {
    // get raw information of how many bytes are used by ptr;
    size_t n = malloc_internals_get_size(ptr);

    // convert the number of bytes to number of objects in the array
    n = c.bytesToClassSize(n);

    c* castedPointer = (c*) ptr;

    // calls the destructor
    for (i = 0; i < n; ++i) {
        c.destructor(castedPointer[i]);
    }

    // free memory chunk
    free(ptr);
}
like image 806
Hadley Siqueira Avatar asked Dec 09 '22 23:12

Hadley Siqueira


2 Answers

why in C malloc/free can allocate de-allocate both single 'objects'

Malloc doesn't create any objects. It allocates "raw memory" which doesn't contain any objects. Correspondingly, free doesn't destroy any objects. new expressions do create objects, and delete destroys an object, while delete[] destroys an array of objects.

In order for the language implementation to know how many objects need to be destroyed by delete[], that number has to be stored somewhere. In order for the language implementation to know how many objects need to be destroyed by delete, that number does not need to be stored anywhere because it is always one.

Storing a number is not free, and storing an unused number is an unnecessary overhead. The different forms of deletion exist so that the language implementation can destroy the correct number of objects without having to store the number of objects created by a non-array new.

then how malloc/free handles this overhead?

malloc/free doesn't have this overhead since it doesn't create or destroy objects. As such, there is nothing that needs to be handled.

There is an analogous issue of storing the number of allocated bytes that malloc does need to deal with. There is no analogous separate function for allocation or freeing of a single byte. This may be because such use case is probably rare. Malloc has more clever ways of dealing with storing this because allocating more memory than is needed is not observable, while such trick is not possible with number of objects because creation and destruction of objects is observable (at least in case of non-trivial types).

new typically deals with the issue of storing the number of allocated bytes through using malloc internally.

couldn't the compiler be smart enough to generate the appropriate code under the hood

Not without some kind of overhead, no. With overhead yes, it could.

But then again, maybe there's a catch that I am not seeing.

I'm not sure if it is a catch that you haven't seen, but the catch with your idea is the overhead of the integer that is to be allocated even when a single object is allocated.

like image 132
eerorika Avatar answered Jan 12 '23 01:01

eerorika


Some clever implementations of malloc don't actually keep track of the size per allocation (by clever use of rounding up), thus it have extremely low space overhead. They'll allocate a large block of memory, and anytime a dev allocates <64 bytes, it'll just use the next 64 bytes of this block, and mark a single bit elsewhere that tracks that this block of 64 bytes is now in use. Even if the user only wants to allocate 1 byte, it hands out a whole block. This means each allocation has only a single bit overhead, so every 8 allocations has a shared byte of overhead. Practically nothing. (There are far smarter strategies than this, this is just a simplified example)

new and delete can share this super-low-overhead implementation, because delete knows to always destroy one object, regardless of the amount of space it actually has. This is again, super fast, and has low space overhead.

delete[] can't do that because it has to know exactly how many destructors to call. So it has to keep track of how many items are in the array, up to std::size_t, which means ~4 bytes have to be added to every new[]. If the data requires an alignment >4, then each allocation also has to have wasted padding bytes between the count and the first item in the array. And delete[] therefore has to know how to look past the padding, find the count, so it knows exactly how many objects to destroy. This takes both time and more space, for every allocation.

C++ gives you the choice between "always works, but slower and bigger", and "only works for one item, but faster and smaller", so the program can run as fast as possible.

like image 44
Mooing Duck Avatar answered Jan 12 '23 01:01

Mooing Duck