Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does splitting a matrix in blocks improve cache hits?

Tags:

caching

matrix

I am trying to understand the concepts of caches and blocking in matrix. I am trying to transpose a matrix.

I understand the concept of row-wise memory layout, so I get that when I am trying to access the data row-wise, I get fewer cache-misses as compared to column wise.

  for( int i = 0; i < n; i++ )

    for( int j = 0; j < n; j++ )

      destination[j+i*n] = source[i+j*n];

So for the source matrix, I will have fewer cache misses, while for destination I will have more cache misses.


Here is code with blocking

for (int i = 0; i < n; i += blocksize) {
    for (int j = 0; j < n; j += blocksize) {
        // transpose the block beginning at [i,j]
        for (int k = i; k < i + blocksize; ++k) {
            for (int l = j; l < j + blocksize; ++l) {
                dst[k + l*n] = src[l + k*n];
            }
        }
    }
}

The code above makes use of blocking technique. I can't quite understand how does blocking help with the performance?

like image 252
Dude Avatar asked Mar 22 '17 02:03

Dude


1 Answers

Yes, you'll have more cache hits on one side than on the other.

The trick, though is to break it into small enough pieces, so that, they can be 'reused' while processing.

Matrices

E.g. in the example above, we'll have 1 cache miss on src matrix and 4 on dst size (I've picked cache line size of 4 elements and block size of 4 elements, but that's just coincidence).

If the cache size is greater than 5 lines, we will have no more misses while processing the line.

If the cache size is smaller than that, there will be more misses as the lines will be crowding out each other. In this case src will stay in cache as more used and dst one's will be dropped thus giving us 16 misses on dst side. 5 looks better than 17 :)

So by controlling the size of the block being low enough, we can have reduce cache miss rate.

like image 88
Ivan Avatar answered Dec 21 '22 22:12

Ivan