Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

struct of arrays and memory access patterns

This is sort of a follow up to this original question with some new information added. See here for the first part if you're interested: Struct of arrays, arrays of structs and memory usage pattern

It seems that there are quite a few problems with my first attempt at setting up struct of arrays for a simple class. Mainly being excess memory allocation for pointers and possible memory leaks from allocating those pointers from vec3_b in the previous question.

I though about how I could rearrange the data w/o using pointers, this requires me to first setup some const variable for the size of my data buckets, so no unbounded values like pointers but also reduces the amount of memory to something fixed.

    const size_t batch_size = 100;
    struct vec3_c
    {
    size_t x[batch_size];
    size_t y[batch_size];
    size_t z[batch_size];
    };

    struct vec3_c vec3_c(size_t x, size_t y, size_t z, size_t index)
    {
        struct vec3_c v;
        v.x[index] = x;
        v.y[index] = y;
        v.z[index] = z;
        return v;
    }

        struct vec3_c vc3;        
        for(int i = 0; i < batch_size; i++)
        {
            vc3 = vec3_c(i+1, i*i, i*10, i);
            //printf("vec3c x:%zu, y:%zu, z:%zu\n",vc3.x[i], vc3.y[i], vc3.z[i]);
            printf("vec3c x:%p, y:%p, z:%p\n",(void*)&vc3.x[i], (void*)&vc3.y[i], (void*)&vc3.z[i]);
        }

      ---------------x-----------------|----------------y-----------------|----------------z-----------------|

0|      0x7fff57489f40 : 140734657765184 | 0x7fff5748a260 : 140734657765984 | 0x7fff5748a580 : 140734657766784
1|      0x7fff57489f48 : 140734657765192 | 0x7fff5748a268 : 140734657765992 | 0x7fff5748a588 : 140734657766792
2|      0x7fff57489f50 : 140734657765200 | 0x7fff5748a270 : 140734657766000 | 0x7fff5748a590 : 140734657766800

with this updated code I have to have a fixed bucket size so I set it to batch_size of 100 just for simple numbers. Fill up the vec3c with some data and did a similar test, this time it seems that each value is aligned in 8 byte chunks.

ex:

size of vec3      : 24 bytes
size of vec3a     : 24 bytes
size of vec3b     : 24 bytes
size of vec3c     : 2400 bytes
size of size_t    : 8 bytes
size of int       : 4 bytes
size of 16 int    : 64 bytes
vec3c x:0x7fff592d2f40, y:0x7fff592d3260, z:0x7fff592d3580
vec3c x:0x7fff592d2f48, y:0x7fff592d3268, z:0x7fff592d3588
vec3c x:0x7fff592d2f50, y:0x7fff592d3270, z:0x7fff592d3590
vec3c x:0x7fff592d2f58, y:0x7fff592d3278, z:0x7fff592d3598
vec3c x:0x7fff592d2f60, y:0x7fff592d3280, z:0x7fff592d35a0
vec3c x:0x7fff592d2f68, y:0x7fff592d3288, z:0x7fff592d35a8
vec3c x:0x7fff592d2f70, y:0x7fff592d3290, z:0x7fff592d35b0
vec3c x:0x7fff592d2f78, y:0x7fff592d3298, z:0x7fff592d35b8
vec3c x:0x7fff592d2f80, y:0x7fff592d32a0, z:0x7fff592d35c0
vec3c x:0x7fff592d2f88, y:0x7fff592d32a8, z:0x7fff592d35c8

are all separated by 8 bytes.

That should get rid of the issues of memory leaks and the excess memory needed for the pointers.

with this being the new layout something like sizeof(vc3[0].x) would return 8 bytes.

back to the original questions:

  1. Is my implementation of struct vec3_c the proper way to setup a struct of array?

  2. with a vec_3c batch size of 100 it's showing 2400 bytes large but each individual element is only 8 bytes and properly aligned, so I could actually now fit 8 elements on 1 modern cpu cache line?

  3. would transforming data passed to me in say a typical format of just arrays of structures outweigh the performance benefits of being in a cache friendly state and able to operate on multiple data points per instruction call? This is with the caveat that both points 1 and 2 are correct.

ex doing the dot product of two vectors: that means I could get the dot product of 2 vec3_c per instruction cycle?

edit one more question, would it be better to add the additional 8 bytes of data to make this struct a multiple of 32 bytes and maybe use that additional 8 bytes as scratch space or just leave it empty?

edit It was pointed out to me that my initial initialization function was just making a mess of things. I updated it to this form:

struct vec3_c* vec3_c()
{
    struct vec3_c *v = (struct vec3_c*)malloc(sizeof(struct vec3_c));
    v->index = 0;
    return v;
}

struct vec3_c* v3 = vec3_c();
    for(size_t i = 0; i < batch_size; i++)
    {
        v3->x[i] = i + 1;
        v3->y[i] = i * i;
        v3->z[i] = i * 10;
        printf("index:%d\tvec3c x:%zu, y:%zu, z:%zu\n",i,v3->x[i], v3->y[i], v3->z[i]);
        printf("index:%zu\tvec3c x:%p, y:%p, z:%p\n",i,(void*)&v3->x[i], (void*)&v3->y[i], (void*)&v3->z[i]);
    }   
like image 487
user1610950 Avatar asked Jul 15 '15 06:07

user1610950


1 Answers

If you are going to have lots of these xyz points and you want to be able to perform an action on all the x's at once, then it makes more sense to put all the x's together:

struct PointBatch {
    size_t x[batchsize];
    size_t y[batchsize];
    size_t z[batchsize];
}

// More efficient for things like
// - find the point with the largest X
// - find the sum of all the points as [xsum, ysum, zsum]

If you normally operate on the x, y, and z of single data points, then it makes more sense to put each point together as a struct.

struct Point {
    size_t x;
    size_t y;
    size_t z;
}

struct Point pointBatch[batchsize];

// Better for things like
// - plot all the points on a graph
// - determine which points satisfy the equation: x^2 + y^2 < z^2

N.B.
Note that where performance is not an issue, you will probably find that the Point/pointBatch approach makes your code easier to write and more readable as struct PointBatch gives you no convenient way to refer to or pass around a single point.

like image 59
2 revs, 2 users 86% Avatar answered Sep 24 '22 06:09

2 revs, 2 users 86%