Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C - Implementing fast push of many elements to the end of array

I have a simple struct to hold an array:

struct array_of_a_type {
        size_t allocated_size;
        size_t elements; /* 1-index based */
        a_type *array;
};

I want to write a simple function, something like this:

bool simple_function(struct array_of_a_type *my_array, int a, int b, int c, int d)
{
    a_type new_chunk[] = {
        a,   b,   a+b, d,   c,
        c,   c,   c+d, b+d, a,
        a+c, b+c, c+d, c+d, c,
    };
    size_t size = sizeof(new_chunk) / sizeof(a_type);
    return push_to_array(my_array, new_chunk, size);
}

The my_array is a static, global variable. Below is an implementation of push_to_array.

static bool push_to_array(struct array_of_a_type *a, a_type *new_chunk, size_t size)
{
    const size_t new_size = a->elements + size;
    const size_t old_size = a->elements;
    if (new_size > a->allocated_size) {
        /* The allocated_size is most of the time big enough.
           I’ve stripped this part of code to minimum. */
        a_type *tmp = realloc(a->array, new_size * sizeof(a_type));
        if (!tmp) {
            return true;
        } else {
            a->array = tmp;
            a->allocated_size = new_size;
        }
    }
    a->elements = new_size;
    memcpy(a->array + old_size, new_chunk, size * sizeof(a_type));
    return false;
}

My question:
How can I rewrite ‘simple_function’ to make more compilers generate code that will write directly to the destination? I would like the code to stay quite short and flexible.

My code works. Unfortunately the gcc (and an old clang) create temporary data on the stack and then copy it to destination. Below if a fragment of generated x86_64 assembler.

movq    8(%rsp), %rdx
movq    %rdx, 8(%rax)
movq    16(%rsp), %rdx
movq    %rdx, 16(%rax)
movq    24(%rsp), %rdx
movq    %rdx, 24(%rax)
movq    32(%rsp), %rdx
movq    %rdx, 32(%rax)

For AMD the assembler have this:

rep movsq

The new clang works fine. I've compiled with -O3.

I have tried with code that added one element a time. There was a lot of conditional jumps to call realloc, unfortunately.

like image 207
Michas Avatar asked Apr 19 '15 04:04

Michas


3 Answers

Some code refactoring must be done.

First you need a helper function that is similar to the function push_to_array, but this one only allocates new memory for elements:

static inline bool increase_size(struct array_of_a_type *a, size_t size)
{
    const size_t new_size = a->elements + size;
    if (new_size > a->allocated_size) {
        a_type *tmp = realloc(a->array, new_size * sizeof(a_type));
        if (!tmp) {
            return true;
        } else {
            a->array = tmp;
            a->allocated_size = new_size;
        }
    }
    a->elements = new_size;
    return false;
}

Coincidentally, the function push_to_array must be changed to avoid code duplication:

static bool push_to_array(struct array_of_a_type *a, a_type *new_chunk, size_t size)
{
    bool failed = increase_size( a , size );
    if( failed )
    {
        return failed;
    }
    memcpy(a->array + ( a->elements - size ), new_chunk, size * sizeof(a_type));
    return false;
}

The simple_function is now really easy to write, without using a temporary array:

bool simple_function(struct array_of_a_type *my_array, int a, int b, int c, int d)
{
    bool failed = increase_size( my_array , 15 );
    if( failed )
    {
        return failed;
    }

    size_t i = my_array->elements - 15;
    my_array->array[i] = a;
    my_array->array[i+1] = b;
    my_array->array[i+2] = a+b;
    my_array->array[i+3] = d;
    //... write the rest of the assignments
    my_array->array[i+14] = c;

    return false;
}
like image 21
2501 Avatar answered Nov 07 '22 18:11

2501


For efficiency, you need to separate the logic for growing the array, and assigning the values to the (unused) slots, to avoid the extra copy (from stack to array).

To prettify the code, you can create a set of helper macros. I shall assume that by "push" you mean "append to the array". If you really meant "prepend", then there is an additional memmove() needed.

Let's assume you have

#include <stdlib.h>
#include <stdio.h>

typedef int  array_data_type;

typedef struct {
    size_t           size;
    size_t           used;
    array_data_type *item;
} array_type;

#define ARRAY_INITIALIZER { 0, 0, NULL }

void array_free(array_type *const array)
{
    free(array->item);
    array->size = 0;
    array->used = 0;
    array->item = NULL;
}

void array_init(array_type *const array)
{
    array->size = 0;
    array->used = 0;
    array->item = NULL;
}

void array_init_size(array_type *const array, const size_t size)
{
    if (!size) {
        array->size = 0;
        array->used = 0;
        array->item = NULL;
        return;
    }

    array->item = malloc(size * sizeof array->item[0]);
    if (!array->item) {
        fprintf(stderr, "array_init_size(%p, %zu): Out of memory.\n", (void *)array, size);
        exit(EXIT_FAILURE);
    }
    array->size = size;
    array->used  = 0;
}

void array_grow_to(array_type *const array, size_t size)
{
    array_data_type *temp;

    if (size < 4)
        size = 4;
    else
    if (size < 16777216) {
        size |= size >> 1;
        size |= size >> 2;
        size |= size >> 4;
        size |= size >> 8;
        size |= size >> 16;
        size++;
    } else
        size = (size | 8388607) + 8388609;

    temp = realloc(array->item, size * sizeof array->item[0]);
    if (!temp) {
        fprintf(stderr, "array_grow_to(%p, %zu): Out of memory.\n", (void *)array, size);
        exit(EXIT_FAILURE);
    }

    array->item = temp;
    array->size = size;
}

static inline array_data_type *array_grow_by(array_type *const array, size_t const count)
{
    array_data_type *retval;

    if (array->used + count > array->size)
        array_grow_to(array, array->used + count);

    retval = array->item + array->used;
    array->used += count;
    return retval;
}

I like to use used for the number of elements in the array, and size for the number of elements the array has memory allocated for. Do a search and replace if you're used to some other names.

array_grow_to() adjusts the new size to at least 4, or the next power of two if less than 16,777,216, or to a larger multiple of 8,388,608. This limits the amount of allocated but unused memory for very large lists.

array_grow_by() ensures the array has room for count new elements, and returns the pointer to the first new unused element.

If you define the following C99 preprocessor macros,

#define MACRO_CONCATENATE(part1, ...)   part1 ## __VA_ARGS__

#define ARRAY_SET_N(array, count, ...)  MACRO_CONCATENATE(ARRAY_SET_, count)(array, count, __VA_ARGS__)
#define ARRAY_SET_0(...)
#define ARRAY_SET_1(a, n, v)        a[n-1] = v
#define ARRAY_SET_2(a, n, v, ...)   a[n-2] = v; ARRAY_SET_1(a, n, __VA_ARGS__)
#define ARRAY_SET_3(a, n, v, ...)   a[n-3] = v; ARRAY_SET_2(a, n, __VA_ARGS__)
#define ARRAY_SET_4(a, n, v, ...)   a[n-4] = v; ARRAY_SET_3(a, n, __VA_ARGS__)
#define ARRAY_SET_5(a, n, v, ...)   a[n-5] = v; ARRAY_SET_4(a, n, __VA_ARGS__)
#define ARRAY_SET_6(a, n, v, ...)   a[n-6] = v; ARRAY_SET_5(a, n, __VA_ARGS__)
#define ARRAY_SET_7(a, n, v, ...)   a[n-7] = v; ARRAY_SET_6(a, n, __VA_ARGS__)
#define ARRAY_SET_8(a, n, v, ...)   a[n-8] = v; ARRAY_SET_7(a, n, __VA_ARGS__)
#define ARRAY_SET_9(a, n, v, ...)   a[n-9] = v; ARRAY_SET_8(a, n, __VA_ARGS__)
#define ARRAY_SET_10(a, n, v, ...)  a[n-10] = v; ARRAY_SET_9(a, n, __VA_ARGS__)
#define ARRAY_SET_11(a, n, v, ...)  a[n-11] = v; ARRAY_SET_10(a, n, __VA_ARGS__)
#define ARRAY_SET_12(a, n, v, ...)  a[n-12] = v; ARRAY_SET_11(a, n, __VA_ARGS__)
#define ARRAY_SET_13(a, n, v, ...)  a[n-13] = v; ARRAY_SET_12(a, n, __VA_ARGS__)
#define ARRAY_SET_14(a, n, v, ...)  a[n-14] = v; ARRAY_SET_13(a, n, __VA_ARGS__)
#define ARRAY_SET_15(a, n, v, ...)  a[n-15] = v; ARRAY_SET_14(a, n, __VA_ARGS__)
#define ARRAY_SET_16(a, n, v, ...)  a[n-16] = v; ARRAY_SET_15(a, n, __VA_ARGS__)
#define ARRAY_SET_17(a, n, v, ...)  a[n-17] = v; ARRAY_SET_16(a, n, __VA_ARGS__)
#define ARRAY_SET_18(a, n, v, ...)  a[n-18] = v; ARRAY_SET_17(a, n, __VA_ARGS__)
#define ARRAY_SET_19(a, n, v, ...)  a[n-19] = v; ARRAY_SET_18(a, n, __VA_ARGS__)
#define ARRAY_SET_20(a, n, v, ...)  a[n-20] = v; ARRAY_SET_19(a, n, __VA_ARGS__)
#define ARRAY_SET_21(a, n, v, ...)  a[n-21] = v; ARRAY_SET_20(a, n, __VA_ARGS__)
#define ARRAY_SET_22(a, n, v, ...)  a[n-22] = v; ARRAY_SET_21(a, n, __VA_ARGS__)
#define ARRAY_SET_23(a, n, v, ...)  a[n-23] = v; ARRAY_SET_22(a, n, __VA_ARGS__)
#define ARRAY_SET_24(a, n, v, ...)  a[n-24] = v; ARRAY_SET_23(a, n, __VA_ARGS__)
#define ARRAY_SET_25(a, n, v, ...)  a[n-25] = v; ARRAY_SET_24(a, n, __VA_ARGS__)
#define ARRAY_SET_26(a, n, v, ...)  a[n-26] = v; ARRAY_SET_25(a, n, __VA_ARGS__)
#define ARRAY_SET_27(a, n, v, ...)  a[n-27] = v; ARRAY_SET_26(a, n, __VA_ARGS__)
#define ARRAY_SET_28(a, n, v, ...)  a[n-28] = v; ARRAY_SET_27(a, n, __VA_ARGS__)
#define ARRAY_SET_29(a, n, v, ...)  a[n-29] = v; ARRAY_SET_28(a, n, __VA_ARGS__)
#define ARRAY_SET_30(a, n, v, ...)  a[n-30] = v; ARRAY_SET_29(a, n, __VA_ARGS__)
#define ARRAY_SET_31(a, n, v, ...)  a[n-31] = v; ARRAY_SET_30(a, n, __VA_ARGS__)
#define ARRAY_SET_32(a, n, v, ...)  a[n-32] = v; ARRAY_SET_31(a, n, __VA_ARGS__)
#define ARRAY_SET_33(a, n, v, ...)  a[n-33] = v; ARRAY_SET_32(a, n, __VA_ARGS__)
#define ARRAY_SET_34(a, n, v, ...)  a[n-34] = v; ARRAY_SET_33(a, n, __VA_ARGS__)
#define ARRAY_SET_35(a, n, v, ...)  a[n-35] = v; ARRAY_SET_34(a, n, __VA_ARGS__)
#define ARRAY_SET_36(a, n, v, ...)  a[n-36] = v; ARRAY_SET_35(a, n, __VA_ARGS__)
#define ARRAY_SET_37(a, n, v, ...)  a[n-37] = v; ARRAY_SET_36(a, n, __VA_ARGS__)
#define ARRAY_SET_38(a, n, v, ...)  a[n-38] = v; ARRAY_SET_37(a, n, __VA_ARGS__)
#define ARRAY_SET_39(a, n, v, ...)  a[n-39] = v; ARRAY_SET_38(a, n, __VA_ARGS__)
#define ARRAY_SET_40(a, n, v, ...)  a[n-40] = v; ARRAY_SET_39(a, n, __VA_ARGS__)
#define ARRAY_SET_41(a, n, v, ...)  a[n-41] = v; ARRAY_SET_40(a, n, __VA_ARGS__)
#define ARRAY_SET_42(a, n, v, ...)  a[n-42] = v; ARRAY_SET_41(a, n, __VA_ARGS__)
#define ARRAY_SET_43(a, n, v, ...)  a[n-43] = v; ARRAY_SET_42(a, n, __VA_ARGS__)
#define ARRAY_SET_44(a, n, v, ...)  a[n-44] = v; ARRAY_SET_43(a, n, __VA_ARGS__)
#define ARRAY_SET_45(a, n, v, ...)  a[n-45] = v; ARRAY_SET_44(a, n, __VA_ARGS__)
#define ARRAY_SET_46(a, n, v, ...)  a[n-46] = v; ARRAY_SET_45(a, n, __VA_ARGS__)
#define ARRAY_SET_47(a, n, v, ...)  a[n-47] = v; ARRAY_SET_46(a, n, __VA_ARGS__)
#define ARRAY_SET_48(a, n, v, ...)  a[n-48] = v; ARRAY_SET_47(a, n, __VA_ARGS__)
#define ARRAY_SET_49(a, n, v, ...)  a[n-49] = v; ARRAY_SET_48(a, n, __VA_ARGS__)
#define ARRAY_SET_50(a, n, v, ...)  a[n-50] = v; ARRAY_SET_49(a, n, __VA_ARGS__)
#define ARRAY_SET_51(a, n, v, ...)  a[n-51] = v; ARRAY_SET_50(a, n, __VA_ARGS__)
#define ARRAY_SET_52(a, n, v, ...)  a[n-52] = v; ARRAY_SET_51(a, n, __VA_ARGS__)
#define ARRAY_SET_53(a, n, v, ...)  a[n-53] = v; ARRAY_SET_52(a, n, __VA_ARGS__)
#define ARRAY_SET_54(a, n, v, ...)  a[n-54] = v; ARRAY_SET_53(a, n, __VA_ARGS__)
#define ARRAY_SET_55(a, n, v, ...)  a[n-55] = v; ARRAY_SET_54(a, n, __VA_ARGS__)
#define ARRAY_SET_56(a, n, v, ...)  a[n-56] = v; ARRAY_SET_55(a, n, __VA_ARGS__)
#define ARRAY_SET_57(a, n, v, ...)  a[n-57] = v; ARRAY_SET_56(a, n, __VA_ARGS__)
#define ARRAY_SET_58(a, n, v, ...)  a[n-58] = v; ARRAY_SET_57(a, n, __VA_ARGS__)
#define ARRAY_SET_59(a, n, v, ...)  a[n-59] = v; ARRAY_SET_58(a, n, __VA_ARGS__)
#define ARRAY_SET_60(a, n, v, ...)  a[n-60] = v; ARRAY_SET_59(a, n, __VA_ARGS__)
#define ARRAY_SET_61(a, n, v, ...)  a[n-61] = v; ARRAY_SET_60(a, n, __VA_ARGS__)
#define ARRAY_SET_62(a, n, v, ...)  a[n-62] = v; ARRAY_SET_61(a, n, __VA_ARGS__)
#define ARRAY_SET_63(a, n, v, ...)  a[n-63] = v; ARRAY_SET_62(a, n, __VA_ARGS__)
#define ARRAY_SET_64(a, n, v, ...)  a[n-64] = v; ARRAY_SET_63(a, n, __VA_ARGS__)

#define ARRAY_APPEND_N(array, count, ...)                           \
    do {                                                            \
        array_data_type *const _base = array_grow_by(array, count); \
        ARRAY_SET_N(_base, count, __VA_ARGS__);                     \
    } while(0)

you can then write your simple function as

void simple_function(array_type *array,
                     const array_data_type a, const array_data_type b,
                     const array_data_type c, const array_data_type d)
{
    ARRAY_APPEND_N(array, 15, a,   b,   a+b, d,   c,
                              c,   c,   c+d, b+d, a,
                              a+c, b+c, c+d, c+d, c);
}

and have it preprocessed into (except for indentation)

void simple_function(array_type *array,
                     const array_data_type a, const array_data_type b,
                     const array_data_type c, const array_data_type d)
{
    do {
        array_data_type *const _base = array_grow_by(array, 15);
        _base[15 - 15] = a;
        _base[15 - 14] = b;
        _base[15 - 13] = a+b;
        _base[15 - 12] = d;
        _base[15 - 11] = c;
        _base[15 - 10] = c;
        _base[15 -  9] = c;
        _base[15 -  8] = c+d;
        _base[15 -  7] = b+d;
        _base[15 -  6] = a;
        _base[15 -  5] = a+c;
        _base[15 -  4] = b+c;
        _base[15 -  3] = c+d;
        _base[15 -  2] = c+d;
        _base[15 -  1] = c;
    } while(0);
}

which generally compiles to excellent machine code on Intel/AMD64 architectures (and other architectures that support relative addressing modes). On some other architectures, it is better to not make _base a constant, and instead autoincrement it (*(_base++) = v;).

If you implement a PP_NARG() macro to count the number of macro arguments, you can add macro

#define ARRAY_APPEND(array, ...) ARRAY_APPEND_N(array, PP_NARG(__VA_ARGS__), __VA_ARGS__)

in which case your function simplifies to

void simple_function(array_type *array,
                     const array_data_type a, const array_data_type b,
                     const array_data_type c, const array_data_type d)
{
    ARRAY_APPEND(array, a,   b,   a+b, d,   c,
                        c,   c,   c+d, b+d, a,
                        a+c, b+c, c+d, c+d, c);
}

The number of preprocessor macro arguments is limited to 64 in some compilers, which limits the maximum number of elements a single macro can add to 62. Depending on the compiler you use, you can extend the macros to support larger number of arguments, but other compilers may choke on those.

like image 117
Nominal Animal Avatar answered Nov 07 '22 19:11

Nominal Animal


Are you mad that your simple_function's a_type array is on the stack? That's because you made it an array with the [] which creates it on the stack. You need to make the array like this:

a_type *ap = malloc(<size> * sizeof(a_type));
atype[0] = a;
...

then you can return ap at the end.

Also you'll probably want to push to the array a member at a time, so you can keep the static array, and then do this:

int i;
for (i = 0; i < <size>; i++)
    push_to_array(&my_array, new[i]);

and have your push_to_array function change up a bit.

An implementation of push for a stack can be found here, note that the grow function handles the reallocation: https://github.com/minshallj/my_clib/blob/master/stack.c#L24-L31 You should be able to adjust this to your array "class".

Also, is my_array a global that lives somewhere else in your program? I don't see it declared anywhere.

like image 4
Jacob Minshall Avatar answered Nov 07 '22 18:11

Jacob Minshall