Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing array transposing function

I'm working on a homework assignment, and I've been stuck for hours on my solution. The problem we've been given is to optimize the following code, so that it runs faster, regardless of how messy it becomes. We're supposed to use stuff like exploiting cache blocks and loop unrolling.

Problem:

//transpose a dim x dim matrix into dist by swapping all i,j with j,i
void transpose(int *dst, int *src, int dim) {
    int i, j;

    for(i = 0; i < dim; i++) {
        for(j = 0; j < dim; j++) {
                dst[j*dim + i] = src[i*dim + j];
        }
    }
}

What I have so far:

//attempt 1
void transpose(int *dst, int *src, int dim) {
    int i, j, id, jd;

    id = 0;
    for(i = 0; i < dim; i++, id+=dim) {
        jd = 0;
        for(j = 0; j < dim; j++, jd+=dim) {
                dst[jd + i] = src[id + j];
        }
    }
}

//attempt 2
void transpose(int *dst, int *src, int dim) {
    int i, j, id;
    int *pd, *ps;
    id = 0;
    for(i = 0; i < dim; i++, id+=dim) {
        pd = dst + i;
        ps = src + id;
        for(j = 0; j < dim; j++) {
                *pd = *ps++;
                pd += dim;
        }
    }
}

Some ideas, please correct me if I'm wrong:

I have thought about loop unrolling but I dont think that would help, because we don't know if the NxN matrix has prime dimensions or not. If I checked for that, it would include excess calculations which would just slow down the function.

Cache blocks wouldn't be very useful, because no matter what, we will be accessing one array linearly (1,2,3,4) while the other we will be accessing in jumps of N. While we can get the function to abuse the cache and access the src block faster, it will still take a long time to place those into the dst matrix.

I have also tried using pointers instead of array accessors, but I don't think that actually speeds up the program in any way.

Any help would be greatly appreciated.

Thanks

like image 798
Glen Takahashi Avatar asked May 30 '12 05:05

Glen Takahashi


People also ask

How to transpose an array directly in R?

We can transpose an array directly in R using the inbuilt function t (). This function takes the array as a parameter and returns its transpose.

How do I transpose an array in NumPy?

numpy.ndarray.transpose () function returns a view of the array with axes transposed. For a 1-D array this has no effect, as a transposed vector is simply the same vector. For a 2-D array, this is a standard matrix transpose.

How to transpose a 2-dimensional array in Excel?

This function will Transpose a 2-dimensional array: To test this function, call the procedure TestTransposeArray: here an initial array testArr is created and outputArr is the final transposed array. Instead, you might want to transpose an array to Excel. To do so, you can use the Excel Transpose Worksheet Function.

How do you test a transpose array in Excel?

To test this function, call the procedure TestTransposeArray: here an initial array testArr is created and outputArr is the final transposed array. Instead, you might want to transpose an array to Excel. To do so, you can use the Excel Transpose Worksheet Function.


1 Answers

Cache blocking can be useful. For an example, lets say we have a cache line size of 64 bytes (which is what x86 uses these days). So for a large enough matrix such that it's larger than the cache size, then if we transpose a 16x16 block (since sizeof(int) == 4, thus 16 ints fit in a cache line, assuming the matrix is aligned on a cacheline bounday) we need to load 32 (16 from the source matrix, 16 from the destination matrix before we can dirty them) cache lines from memory and store another 16 lines (even though the stores are not sequential). In contrast, without cache blocking transposing the equivalent 16*16 elements requires us to load 16 cache lines from the source matrix, but 16*16=256 cache lines to be loaded and then stored for the destination matrix.

like image 103
janneb Avatar answered Oct 16 '22 18:10

janneb