Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is reading a "zero" from memory faster than reading other values?

Tags:

c

memory

time

I am running a memory access experiment in which a 2D matrix was used with each row being the size of a memory page. The experiment consists of reading every element using row/column major and then also writing to each element using row/column major. The matrix being accessed was declared with global scope to ease the programming requirements.

The point of this question is that with the test matrix being declared statically, the values are initialized to zero by the compiler and the results I found were quite interesting. When I did read operations first, i.e.

rowMajor_read();
colMajor_read();
rowMajor_write();
colMajor_write(); 

Then my colMajor_read operation finished very quickly. enter image description here

However, if I do the write operations before reading we have:

rowMajor_write();
colMajor_write();
rowMajor_read();
colMajor_read(); 

enter image description here

And the column-major read operation has increased by nearly an order of magnitude.

I figured that it must have something to do with how the compiler optimizes the code. Since the global matrix was identically zero for every element, did the compiler completely remove the read operations? Or is it somehow "easier" to read a value from memory that is identically zero?

I do not pass any special compiler commands with respect to optimizations, but I did declare my functions in this manner.

inline void colMajor_read(){
    register int row, col;
    register volatile char temp __attribute__((unused));
    for(col = 0; col < COL_COUNT; col++)
        for(row = 0; row < ROW_COUNT; row++)
            temp = testArray[row][col];
}

Because I was running into issues where the compiler completely removed the temp variable from the above function since it was never being used. I think that having both volatile and __attribute__((unused)) is redundant, but I included it nonetheless. I was under the impression that no optimizations were implemented on a volatile variable.

Any ideas?


I looked at the generated assembly and the results are identical for the colMajor_read function. The (assembly) non-inline version: http://pastebin.com/C8062fYB

like image 867
sherrellbc Avatar asked Oct 31 '14 17:10

sherrellbc


1 Answers

Check the memory usage of your process before and after writing out values to the matrix. If it's stored in the .bss section on Linux, for example, the zeroed pages will be mapped to a single read-only page with copy-on-write semantics. So, even though you're reading through a bunch of addresses, you may be reading the same page of physical memory over and over.

This page http://madalanarayana.wordpress.com/2014/01/22/bss-segment/ has a good explanation.

If that's the case, zero out the matrix again afterward and rerun your read test and it should no longer be so much faster.

like image 99
FatalError Avatar answered Nov 16 '22 02:11

FatalError