Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

malloc & placement new vs. new

I've been looking into this for the past few days, and so far I haven't really found anything convincing other than dogmatic arguments or appeals to tradition (i.e. "it's the C++ way!").

If I'm creating an array of objects, what is the compelling reason (other than ease) for using:

#define MY_ARRAY_SIZE 10

//  ...

my_object * my_array=new my_object [MY_ARRAY_SIZE];

for (int i=0;i<MY_ARRAY_SIZE;++i) my_array[i]=my_object(i);

over

#define MEMORY_ERROR -1
#define MY_ARRAY_SIZE 10

//  ...

my_object * my_array=(my_object *)malloc(sizeof(my_object)*MY_ARRAY_SIZE);
if (my_object==NULL) throw MEMORY_ERROR;

for (int i=0;i<MY_ARRAY_SIZE;++i) new (my_array+i) my_object (i);

As far as I can tell the latter is much more efficient than the former (since you don't initialize memory to some non-random value/call default constructors unnecessarily), and the only difference really is the fact that one you clean up with:

delete [] my_array;

and the other you clean up with:

for (int i=0;i<MY_ARRAY_SIZE;++i) my_array[i].~T();

free(my_array);

I'm out for a compelling reason. Appeals to the fact that it's C++ (not C) and therefore malloc and free shouldn't be used isn't -- as far as I can tell -- compelling as much as it is dogmatic. Is there something I'm missing that makes new [] superior to malloc?

I mean, as best I can tell, you can't even use new [] -- at all -- to make an array of things that don't have a default, parameterless constructor, whereas the malloc method can thusly be used.

like image 321
Robert Allan Hennigan Leahy Avatar asked Jan 22 '12 07:01

Robert Allan Hennigan Leahy


3 Answers

I'm out for a compelling reason.

It depends on how you define "compelling". Many of the arguments you have thus far rejected are certainly compelling to most C++ programmers, as your suggestion is not the standard way to allocate naked arrays in C++.

The simple fact is this: yes, you absolutely can do things the way you describe. There is no reason that what you are describing will not function.

But then again, you can have virtual functions in C. You can implement classes and inheritance in plain C, if you put the time and effort into it. Those are entirely functional as well.

Therefore, what matters is not whether something can work. But more on what the costs are. It's much more error prone to implement inheritance and virtual functions in C than C++. There are multiple ways to implement it in C, which leads to incompatible implementations. Whereas, because they're first-class language features of C++, it's highly unlikely that someone would manually implement what the language offers. Thus, everyone's inheritance and virtual functions can cooperate with the rules of C++.

The same goes for this. So what are the gains and the losses from manual malloc/free array management?

I can't say that any of what I'm about to say constitutes a "compelling reason" for you. I rather doubt it will, since you seem to have made up your mind. But for the record:

Performance

You claim the following:

As far as I can tell the latter is much more efficient than the former (since you don't initialize memory to some non-random value/call default constructors unnecessarily), and the only difference really is the fact that one you clean up with:

This statement suggests that the efficiency gain is primarily in the construction of the objects in question. That is, which constructors are called. The statement presupposes that you don't want to call the default constructor; that you use a default constructor just to create the array, then use the real initialization function to put the actual data into the object.

Well... what if that's not what you want to do? What if what you want to do is create an empty array, one that is default constructed? In this case, this advantage disappears entirely.

Fragility

Let's assume that each object in the array needs to have a specialized constructor or something called on it, such that initializing the array requires this sort of thing. But consider your destruction code:

for (int i=0;i<MY_ARRAY_SIZE;++i) my_array[i].~T(); 

For a simple case, this is fine. You have a macro or const variable that says how many objects you have. And you loop over each element to destroy the data. That's great for a simple example.

Now consider a real application, not an example. How many different places will you be creating an array in? Dozens? Hundreds? Each and every one will need to have its own for loop for initializing the array. Each and every one will need to have its own for loop for destroying the array.

Mis-type this even once, and you can corrupt memory. Or not delete something. Or any number of other horrible things.

And here's an important question: for a given array, where do you keep the size? Do you know how many items you allocated for every array that you create? Each array will probably have its own way of knowing how many items it stores. So each destructor loop will need to fetch this data properly. If it gets it wrong... boom.

And then we have exception safety, which is a whole new can of worms. If one of the constructors throws an exception, the previously constructed objects need to be destructed. Your code doesn't do that; it's not exception-safe.

Now, consider the alternative:

delete[] my_array; 

This can't fail. It will always destroy every element. It tracks the size of the array, and it's exception-safe. So it is guaranteed to work. It can't not work (as long as you allocated it with new[]).

Of course, you could say that you could wrap the array in an object. That makes sense. You might even template the object on the type elements of the array. That way, all the desturctor code is the same. The size is contained in the object. And maybe, just maybe, you realize that the user should have some control over the particular way the memory is allocated, so that it's not just malloc/free.

Congratulations: you just re-invented std::vector.

Which is why many C++ programmers don't even type new[] anymore.

Flexibility

Your code uses malloc/free. But let's say I'm doing some profiling. And I realize that malloc/free for certain frequently created types is just too expensive. I create a special memory manager for them. But how to hook all of the array allocations to them?

Well, I have to search the codebase for any location where you create/destroy arrays of these types. And then I have to change their memory allocators accordingly. And then I have to continuously watch the codebase so that someone else doesn't change those allocators back or introduce new array code that uses different allocators.

If I were instead using new[]/delete[], I could use operator overloading. I simply provide an overload for operators new[] and delete[] for those types. No code has to change. It's much more difficult for someone to circumvent these overloads; they have to actively try to. And so forth.

So I get greater flexibility and reasonable assurance that my allocators will be used where they should be used.

Readability

Consider this:

my_object *my_array = new my_object[10]; for (int i=0; i<MY_ARRAY_SIZE; ++i)   my_array[i]=my_object(i);  //... Do stuff with the array  delete [] my_array; 

Compare it to this:

my_object *my_array = (my_object *)malloc(sizeof(my_object) * MY_ARRAY_SIZE); if(my_object==NULL)   throw MEMORY_ERROR;  int i; try {     for(i=0; i<MY_ARRAY_SIZE; ++i)       new(my_array+i) my_object(i); } catch(...)  //Exception safety. {     for(i; i>0; --i)  //The i-th object was not successfully constructed         my_array[i-1].~T();     throw; }  //... Do stuff with the array  for(int i=MY_ARRAY_SIZE; i>=0; --i)   my_array[i].~T(); free(my_array); 

Objectively speaking, which one of these is easier to read and understand what's going on?

Just look at this statement: (my_object *)malloc(sizeof(my_object) * MY_ARRAY_SIZE). This is a very low level thing. You're not allocating an array of anything; you're allocating a hunk of memory. You have to manually compute the size of the hunk of memory to match the size of the object * the number of objects you want. It even features a cast.

By contrast, new my_object[10] tells the story. new is the C++ keyword for "create instances of types". my_object[10] is a 10 element array of my_object type. It's simple, obvious, and intuitive. There's no casting, no computing of byte sizes, nothing.

The malloc method requires learning how to use malloc idiomatically. The new method requires just understanding how new works. It's much less verbose and much more obvious what's going on.

Furthermore, after the malloc statement, you do not in fact have an array of objects. malloc simply returns a block of memory that you have told the C++ compiler to pretend is a pointer to an object (with a cast). It isn't an array of objects, because objects in C++ have lifetimes. And an object's lifetime does not begin until it is constructed. Nothing in that memory has had a constructor called on it yet, and therefore there are no living objects in it.

my_array at that point is not an array; it's just a block of memory. It doesn't become an array of my_objects until you construct them in the next step. This is incredibly unintuitive to a new programmer; it takes a seasoned C++ hand (one who probably learned from C) to know that those aren't live objects and should be treated with care. The pointer does not yet behave like a proper my_object*, because it doesn't point to any my_objects yet.

By contrast, you do have living objects in the new[] case. The objects have been constructed; they are live and fully-formed. You can use this pointer just like any other my_object*.

Fin

None of the above says that this mechanism isn't potentially useful in the right circumstances. But it's one thing to acknowledge the utility of something in certain circumstances. It's quite another to say that it should be the default way of doing things.

like image 169
Nicol Bolas Avatar answered Sep 21 '22 00:09

Nicol Bolas


If you do not want to get your memory initialized by implicit constructor calls, and just need an assured memory allocation for placement new then it is perfectly fine to use malloc and free instead of new[] and delete[].

The compelling reasons of using new over malloc is that new provides implicit initialization through constructor calls, saving you additional memset or related function calls post an malloc And that for new you do not need to check for NULL after every allocation, just enclosing exception handlers will do the job saving you redundant error checking unlike malloc.
These both compelling reasons do not apply to your usage.

which one is performance efficient can only be determined by profiling, there is nothing wrong in the approach you have now. On a side note I don't see a compelling reason as to why use malloc over new[] either.

like image 23
Alok Save Avatar answered Sep 22 '22 00:09

Alok Save


I would say neither.

The best way to do it would be:

std::vector<my_object>   my_array;
my_array.reserve(MY_ARRAY_SIZE);

for (int i=0;i<MY_ARRAY_SIZE;++i)
{    my_array.push_back(my_object(i));
}

This is because internally vector is probably doing the placement new for you. It also managing all the other problems associated with memory management that you are not taking into account.

like image 30
Martin York Avatar answered Sep 22 '22 00:09

Martin York