Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is doubling the capacity of a dynamic array necessary?

Tags:

c

When making automatically expanding arrays (like C++'s std::vector) in C, it is often common (or at least common advice) to double the size of the array each time it is filled to limit the amount of calls to realloc in order to avoid copying the entire array as much as possible.

Eg. we start by allocating room for 8 elements, 8 elements are inserted, we then allocate room for 16 elements, 8 more elements are inserted, we allocate for 32.., etc.

But realloc does not have to actually copy the data if it can expand the existing memory allocation. For example, the following code only does 1 copy (the initial NULL allocation, so it is not really a copy) on my system, even though it calls realloc 10000 times:

#include <stdlib.h>
#include <stdio.h>

int main()
{
    int i;
    int copies = 0;
    void *data = NULL;
    void *ndata;

    for (i = 0; i < 10000; i++)
    {
        ndata = realloc(data, i * sizeof(int));
        if (data != ndata)
            copies++;
        data = ndata;
    }
    printf("%d\n", copies); 
}

I realize that this example is very clinical - a real world application would probably have more memory fragmentation and would do more copies, but even if I make a bunch of random allocations before the realloc loop, it only does marginally worse with 2-4 copies instead.

So, is the "doubling method" really necessary? Would it not be better to just call realloc each time a element is added to the dynamic array?

like image 431
Mike Pedersen Avatar asked Jan 12 '23 17:01

Mike Pedersen


2 Answers

You have to step back from your code for a minute and thing abstractly. What is the cost of growing a dynamic container? Programmers and researchers don't think in terms of "this took 2ms", but rather in terms of asymptotic complexity: What is the cost of growing by one element given that I already have n elements; how does this change as n increases?

If you only ever grew by a constant (or bounded) amount, then you would periodically have to move all the data, and so the cost of growing would depend on, and grow with, the size of the container. By contrast, when you grow the container geometrically, i.e. multiply its size by a fixed factor, every time it is full, then the expected cost of inserting is actually independent of the number of elements, i.e. constant.

It is of course not always constant, but it's amortized constant, meaning that if you keep inserting elements, then the average cost per element is constant. Every now and then you have to grow and move, but those events get rarer and rarer as you insert more and more elements.

I once asked whether it makes sense for C++ allocators to be able to grow, in the way that realloc does. The answers I got indicated that the non-moving growing behaviour of realloc is actually a bit of a red herring when you think asymptotically. Eventually you won't be able to grow anymore, and you'll have to move, and so for the sake of studying the asymptotic cost, it's actually irrelevant whether realloc can sometimes be a no-op or not. (Moreover, non-moving growth seems to upset moder, arena-based allocators, which expect all their allocations to be of a similar size.)

like image 133
Kerrek SB Avatar answered Jan 17 '23 13:01

Kerrek SB


Compared to almost every other type of operation, malloc, calloc, and especially realloc are very memory expensive. I've personally benchmarked 10,000,000 reallocs, and it takes a HUGE amount of time to do that.

Even though I had other operations going on at the same time (in both benchmark tests), I found that I could literally cut HOURS off of the run time by using max_size *= 2 instead of max_size += 1.

like image 44
ciphermagi Avatar answered Jan 17 '23 13:01

ciphermagi