Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the result of Vector2.Normalize() change after calling it 34 times with identical inputs?

Tags:

c#

.net

.net-core

Here is a simple C# .NET Core 3.1 program that calls System.Numerics.Vector2.Normalize() in a loop (with identical input every call) and prints out the resulting normalized vector:

using System;
using System.Numerics;
using System.Threading;

namespace NormalizeTest
{
    class Program
    {
        static void Main()
        {
            Vector2 v = new Vector2(9.856331f, -2.2437377f);
            for(int i = 0; ; i++)
            {
                Test(v, i);
                Thread.Sleep(100);
            }
        }

        static void Test(Vector2 v, int i)
        {
            v = Vector2.Normalize(v);
            Console.WriteLine($"{i:0000}: {v}");
        }
    }
}

And here is the output of running that program on my computer (truncated for brevity):

0000: <0.9750545, -0.22196561>
0001: <0.9750545, -0.22196561>
0002: <0.9750545, -0.22196561>
...
0031: <0.9750545, -0.22196561>
0032: <0.9750545, -0.22196561>
0033: <0.9750545, -0.22196561>
0034: <0.97505456, -0.22196563>
0035: <0.97505456, -0.22196563>
0036: <0.97505456, -0.22196563>
...

So my question is, why does the result of calling Vector2.Normalize(v) change from <0.9750545, -0.22196561> to <0.97505456, -0.22196563> after calling it 34 times? Is this expected, or is this a bug in the language/runtime?

like image 944
Walt D Avatar asked Dec 22 '19 18:12

Walt D


1 Answers

So my question is, why does the result of calling Vector2.Normalize(v) change from <0.9750545, -0.22196561> to <0.97505456, -0.22196563> after calling it 34 times?

So first - why the change occurs. The changed is observed because the code that calculates those values changes too.

If we break into WinDbg early on in the first executions of the code and go a little bit down into the code that calculates the Normalizeed vector, we could see the following assembly (more or less - I've cut down some parts):

movss   xmm0,dword ptr [rax]
movss   xmm1,dword ptr [rax+4]
lea     rax,[rsp+40h]
movss   xmm2,dword ptr [rax]
movss   xmm3,dword ptr [rax+4]
mulss   xmm0,xmm2
mulss   xmm1,xmm3
addss   xmm0,xmm1
sqrtss  xmm0,xmm0
lea     rax,[rsp+40h]
movss   xmm1,dword ptr [rax]
movss   xmm2,dword ptr [rax+4]
xorps   xmm3,xmm3
movss   dword ptr [rsp+28h],xmm3
movss   dword ptr [rsp+2Ch],xmm3
divss   xmm1,xmm0
movss   dword ptr [rsp+28h],xmm1
divss   xmm2,xmm0
movss   dword ptr [rsp+2Ch],xmm2
mov     rax,qword ptr [rsp+28h]

and after ~30 executions (more about this number later) this would be the code:

vmovsd  xmm0,qword ptr [rsp+70h]
vmovsd  qword ptr [rsp+48h],xmm0
vmovsd  xmm0,qword ptr [rsp+48h]
vmovsd  xmm1,qword ptr [rsp+48h]
vdpps   xmm0,xmm0,xmm1,0F1h
vsqrtss xmm0,xmm0,xmm0
vinsertps xmm0,xmm0,xmm0,0Eh
vshufps xmm0,xmm0,xmm0,50h
vmovsd  qword ptr [rsp+40h],xmm0
vmovsd  xmm0,qword ptr [rsp+48h]
vmovsd  xmm1,qword ptr [rsp+40h]
vdivps  xmm0,xmm0,xmm1
vpslldq xmm0,xmm0,8
vpsrldq xmm0,xmm0,8
vmovq   rcx,xmm0

Different opcodes, different extensions - SSE vs AVX and, I guess, with different opcodes we get different precision of the calculations.

So now more about the why? .NET Core (not sure about the version - assuming 3.0 - but it was tested in 2.1) has something that's called "Tiered JIT compilation". What it does is at the beginning it produces code that is generated fast, but might not be super optimal. Only later when the runtime detects that the code is highly utilized will it spend some additional time to generates new, more optimized code. This is a new thing in .NET Core so such behavior might not be observed earlier.

Also why 34 calls? This is a bit strange as I would expect this to happen around 30 executions as this is the threshold at which tiered compilation kicks in. The constant can be seen in the source code of coreclr. Maybe there is some additional variability to when it kicks in.

Just to confirm that this is the case, you can disable tiered compilation by setting the environmental variable by issuing set COMPlus_TieredCompilation=0 and checking the execution again. The strange effect is gone.

C:\Users\lukas\source\repos\FloatMultiple\FloatMultiple\bin\Release\netcoreapp3.1
λ FloatMultiple.exe

0000: <0,9750545  -0,22196561>
0001: <0,9750545  -0,22196561>
0002: <0,9750545  -0,22196561>
...
0032: <0,9750545  -0,22196561>
0033: <0,9750545  -0,22196561>
0034: <0,9750545  -0,22196561>
0035: <0,97505456  -0,22196563>
0036: <0,97505456  -0,22196563>
^C
C:\Users\lukas\source\repos\FloatMultiple\FloatMultiple\bin\Release\netcoreapp3.1
λ set COMPlus_TieredCompilation=0

C:\Users\lukas\source\repos\FloatMultiple\FloatMultiple\bin\Release\netcoreapp3.1
λ FloatMultiple.exe

0000: <0,97505456  -0,22196563>
0001: <0,97505456  -0,22196563>
0002: <0,97505456  -0,22196563>
...
0032: <0,97505456  -0,22196563>
0033: <0,97505456  -0,22196563>
0034: <0,97505456  -0,22196563>
0035: <0,97505456  -0,22196563>
0036: <0,97505456  -0,22196563>

Is this expected, or is this a bug in the language/runtime?

There's already a bug reported for this - Issue 1119

like image 190
Paweł Łukasik Avatar answered Nov 15 '22 16:11

Paweł Łukasik