Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it fair to always recommend std::vector over realloc?

Tags:

c++

realloc

From Bjarne Stroustrup's FAQ:

If you feel the need for realloc() - and many do - then consider using a standard library vector.

I'll preface my question by agreeing that std::vector is better for many reasons, and I personally would always choose to use it over writing my own dynamic arrays with C memory allocation.

But, std::vector fragments memory as it grows because C++ has no equivalent of realloc (edit To clarify, I know that std::vector's storage is contiguous and won't get fragmented, I mean fragmentation of the memory space caused by allocating and deallocating, which realloc can avoid by extending an existing allocation). So is it fair to always recommend it over realloc? With great care, couldn't you write something that works just like std::vector but using C allocation functions, which has the possibility to grow its memory without moving its address and copying existing elements, making it as good or better in terms of fragmentation and performance?

And relatedly (bonus question!), why doesn't C++ have an equivalent to realloc? It seems like an odd thing to omit in a language that is so focused on performance. The section in Bjarne's FAQ has exactly that title (minus emphasis), but the answer doesn't address the 'why'. Was it just an accidental omission? Is there some fundamental incompatibility with how new/delete work? Does it not really give the benefits it seems to in practice?

Edit: ok, so I had neglected to consider the C nastiness of realloc - std::vector can't be rewritten using realloc because it only works with PODs, doesn't throw and so on. Perhaps a POD-only container written to deal with the nastiness would be a good idea for some situations. In any case though, the more interesting question becomes: would std::vector benefit from a C++ equivalent of realloc, which has (more or less) been answered here:

Does std::vector *have* to move objects when growing capacity? Or, can allocators "reallocate"?

Sadly, the answer seems to be "yes, but the standards committee didn't vote it in". Here's hoping.

like image 807
Ben Hymers Avatar asked Apr 08 '14 09:04

Ben Hymers


2 Answers

new/new[] and delete/delete[] are typically layered atop the C library allocation functions (ala malloc/realloc/free), possibly with an extra layer for small-object optimisations that use one malloc-ed region for quickly satisfying many small new requests. This layering meant supporting new and delete took very little implementation effort on the part of early C++ library authors.

To utilise the in-place resizing functionality in realloc for C++, though, invasive changes to the realloc library function are needed so that if movement to a new memory region is required, the C++ library code gets the chance to copy-construct/destruct the objects being moved. This could be done as:

  • a callback happening after realloc realised a move was necessary, asking the C++ library to do the actual data movement instead of doing a memcpy() style bytewise copy, or

  • as an additional resize-in-place-or-fail-without-moving function so the C++ library code could try that, then fall back on a malloc and proper/safe copy before deleteing the original objects and deallocating the original memory.

As most C library realloc functions lack any such hook/query facility, the C++ Standard - and Standard library - don't require it. As Mehrdad points out, this answer documents SGI's acknowledgement of this issue.

Given the extensive use of C++ these days, it would - IMHO - make sense to ship a malloc/realloc/free implementation in the C++ library itself that does provide such a hook/query, so that C++ library authors who see utility in realloc can utilise it freely; that'd be a worthy candidate for inclusion in a future Standard.

With great care, couldn't you write something that works just like std::vector but using C allocation functions, which has the possibility to grow its memory without moving its address and copying existing elements, making it as good or better in terms of fragmentation and performance?

As above - no - it's not possible to copy-construct/destruct objects with any amount of care without changes to the realloc API.

like image 136
Tony Delroy Avatar answered Oct 21 '22 02:10

Tony Delroy


Direct comparison

                        |  std::vector     | C memory functions
------------------------+------------------+------------------------
default capacity        | undefined        | undefined
default grow            | towards capacity | undefined
deterministic capacity  | available        | no
deterministic grow      | available        | no
deterministic mem.-move | available        | no
non-POD types           | yes              | f***ing no (*)
no-throw                | no               | yes

deterministic mem.-move follows from deterministic capacity/grow. It is when realloc and std::vector have to move their stored elements to a new memory location.

I think the (available) determinism with regards to memory moving is doubly important when you consider moving (smart) references of any kind.

NOTE: In this respect, I use the term "deterministic" with respect to my source codes lifetime, i.e. its lifetime across different versions of different libraries with different compile flags, etc..


Source

It does fragment memory as much as realloc does:

Class template vector overview [vector.overview]

The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size()

In other words, the memory used is in one piece.

The one big difference is that realloc can actually increase allocated memory portions without having ordered it to do so, however, it is not required to do so (man 3 realloc):

man 3 realloc

The realloc() function changes the size of the memory block pointed to by ptr to size bytes. The contents will be unchanged in the range from the start of the region up to the minimum of the old and new sizes. If the new size is larger than the old size, the added memory will not be initialized. If ptr is NULL, then the call is equivalent to malloc(size), for all values of size; if size is equal to zero, and ptr is not NULL, then the call is equivalent to free(ptr). Unless ptr is NULL, it must have been returned by an earlier call to malloc(), cal‐ loc() or realloc(). If the area pointed to was moved, a free(ptr) is done.

So it can increase the size, but is not required to.

A std::vector carries not only a size, but also a capacity. If you know beforehand you will need a big vector, yet you cannot initialize everything right now, you are entitled to increase the capacity of your vector like so:

std::vector<T> vec(32);
vec.reserve(1024);
// vec has size 32, but reserved a memory region of 1024 elements

So, unlike realloc, the moment when reallocations occur can be deterministic with std::vector.

To answer your question: Because there is std::vector, realloc is not needed. And, realloc is not allowed for non-POD types; attempts to use malloc, free and realloc directly on non-PODs yields undefined behaviour.

like image 28
Sebastian Mach Avatar answered Oct 21 '22 04:10

Sebastian Mach