Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is setting a field many times slower than getting a field?

I already knew that setting a field is much slower than setting a local variable, but it also appears that setting a field with a local variable is much slower than setting a local variable with a field. Why is this? In either case the address of the field is used.

public class Test
{
    public int A = 0;
    public int B = 4;

    public void Method1() // Set local with field
    {
        int a = A;

        for (int i = 0; i < 100; i++)
        {
            a += B;
        }

        A = a;
    }

    public void Method2() // Set field with local
    {
        int b = B;

        for (int i = 0; i < 100; i++)
        {
            A += b;
        }
    }
}

The benchmark results with 10e+6 iterations are:

Method1: 28.1321 ms
Method2: 162.4528 ms
like image 581
toplel32 Avatar asked Nov 24 '14 18:11

toplel32


2 Answers

Running this on my machine, I get similar time differences, however looking at the JITted code for 10M iterations, it's clear to see why this is the case:

Method A:

mov     r8,rcx
; "A" is loaded into eax
mov     eax,dword ptr [r8+8]
xor     edx,edx
; "B" is loaded into ecx
mov     ecx,dword ptr [r8+0Ch]
nop     dword ptr [rax]
loop_start:
; Partially unrolled loop, all additions done in registers
add     eax,ecx
add     eax,ecx
add     eax,ecx
add     eax,ecx
add     edx,4
cmp     edx,989680h
jl      loop_start
; Store the sum in eax back to "A"
mov     dword ptr [r8+8],eax
ret

And Method B:

; "B" is loaded into edx
mov     edx,dword ptr [rcx+0Ch]
xor     r8d,r8d
nop word ptr [rax+rax]
loop_start:
; Partially unrolled loop, but each iteration requires reading "A" from memory
; adding "B" to it, and then writing the new "A" back to memory.
mov     eax,dword ptr [rcx+8]
add     eax,edx
mov     dword ptr [rcx+8],eax
mov     eax,dword ptr [rcx+8]
add     eax,edx
mov     dword ptr [rcx+8],eax
mov     eax,dword ptr [rcx+8]
add     eax,edx
mov     dword ptr [rcx+8],eax
mov     eax,dword ptr [rcx+8]
add     eax,edx
mov     dword ptr [rcx+8],eax
add     r8d,4
cmp     r8d,989680h
jl      loop_start
rep ret

As you can see from the assembly, Method A is going to be significantly faster since the values of A and B are both put in registers, and all of the additions occur there with no intermediate writes to memory. Method B on the other hand incurs a load and store to "A" in memory for every single iteration.

like image 176
Iridium Avatar answered Oct 28 '22 01:10

Iridium


In case 1 a is clearly stored in a register. Anything else would be a horrible compilation result.

Probably, the .NET JIT is not willing/able to convert the stores to A to register stores in case 2.

I doubt this is forced by the .NET memory model because other threads can never tell the difference between your two methods if they only observe A to be 0 or the sum. They cannot disprove the theory that the optimization never happened. That makes it allowed under the semantics of the .NET abstract machine.

It is not suprising to see the .NET JIT perform little optimizations. This is well known to followers of the performance tag on Stack Overflow.

I know from experience that the JIT is much more likely to cache memory loads in registers. That's why case 1 (apparently) does not access B with each iteration.

Register computations are cheaper that memory accesses. This is even true if the memory in question is in the CPU L1 cache (as it is the case here).

I thought only locals were eligible for CPU caching?

This cannot be so because the CPU does not even know what a local is. All addresses look the same.

like image 27
usr Avatar answered Oct 28 '22 00:10

usr