Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# stackalloc slower than regular variables?

I have 2 functions implementing uint128 multiplication in 2 different ways: one is using variables, the other using stackalloc "arrays".

Variable Version

public static UInt128 operator *(UInt128 i, UInt128 j) {

 ulong I0 = i._uint0; ulong I1 = i._uint1; ulong I2 = i._uint2; ulong I3 = i._uint3;
 ulong J0 = j._uint0; ulong J1 = j._uint1; ulong J2 = j._uint2; ulong J3 = j._uint3;
 ulong R0 = 0; ulong R1 = 0; ulong R2 = 0; ulong R3 = 0;

 if (I0 != 0) {
   R0 += I0 * J0;
   R1 += I0 * J1;
   R2 += I0 * J2;
   R3 += I0 * J3;
 }
 if (I1 != 0) {
   R1 += I1 * J0;
   R2 += I1 * J1;
   R3 += I1 * J2;
 }
 if (I2 != 0) {
   R2 += I2 * J0;
   R3 += I2 * J1;
 }
 R3 += I3 * J0;

 R1 += R0 >> 32; R0 &= uint.MaxValue;
 R2 += R1 >> 32; R1 &= uint.MaxValue;
 R3 += R2 >> 32; R2 &= uint.MaxValue;
 R3 &= uint.MaxValue;

 return new UInt128((uint)R3, (uint)R2, (uint)R1, (uint)R0);
}

Stackalloc Version

The [0 + 1], [1 + 1], etc. are left for clarity only. They will be optimized by C# compiler into constants anyways.

public unsafe static UInt128 operator *(UInt128 i, UInt128 j) {

  var I = stackalloc ulong[4];
  var J = stackalloc ulong[4];
  var R = stackalloc ulong[4];

  I[0] = i._uint0; I[1] = i._uint1; I[2] = i._uint2; I[3] = i._uint3;
  J[0] = j._uint0; J[1] = j._uint1; J[2] = j._uint2; J[3] = j._uint3;


  if (I[0] != 0) {
    R[0] += I[0] * J[0];
    R[0 + 1] += I[0] * J[1];
    R[0 + 2] += I[0] * J[2];
    R[0 + 3] += I[0] * J[3];
  }
  if (I[1] != 0) {
    R[1] += I[1] * J[0];
    R[1 + 1] += I[1] * J[1];
    R[1 + 2] += I[1] * J[2];
  }
  if (I[2] != 0) {
    R[2] += I[2] * J[0];
    R[2 + 1] += I[2] * J[1];
  }
  R[3] += I[3] * J[0];


  R[1] += R[0] >> 32; R[0] &= uint.MaxValue;
  R[2] += R[1] >> 32; R[1] &= uint.MaxValue;
  R[3] += R[2] >> 32; R[2] &= uint.MaxValue;
  R[3] &= uint.MaxValue;

  return new UInt128((uint)R[3], (uint)R[2], (uint)R[1], (uint)R[0]);
}

For some reason the "variable" version seems to be ~20% faster than the "stackalloc" version on both x86 and x64 (with optimizations) using C# 7.2 compiler running on .NET 4.6.1. Haven't checked the performance on newer/older frameworks but suspect it will be similar, so my question is not specific to 4.6.1 only, as it seems to be generally the case that stackalloc is slower.

Is there any reason that the stackalloc version is slower considering that both version allocate exactly the same amount of memory (12 * sizeof(ulong)) and perform exactly the same operations in the same order? I would really prefer to work with arrays via stackalloc instead of variables.

like image 351
Fit Dev Avatar asked May 17 '26 03:05

Fit Dev


1 Answers

IL from the variable version (simplified)

IL from the array version (simplified)

The array version is using the stack (see L0009 - L004E), but the variable version is just using a register. Although the data fits into the CPU cache, it's still slower than using a CPU register.

like image 64
sesky4 Avatar answered May 19 '26 18:05

sesky4



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!