It seems that C# is faster at adding two arrays of UInt16[]
than it is at adding two arrays of int[]
. This makes no sense to me, since I would have assumed the arrays would be word-aligned, and thus int[]
would require less work from the CPU, no?
I ran the test-code below, and got the following results:
Int for 1000 took 9896625613 tick (4227 msec)
UInt16 for 1000 took 6297688551 tick (2689 msec)
The test code does the following:
a
and b
, once.a
and b
, item-by-item. This is done 1000 times.This is done for int[] a, b
and for UInt16 a,b
. And every time I run the code, the tests for the UInt16
arrays take 30%-50% less time than the int
arrays. Can you explain this to me?
Here's the code, if you want to try if for yourself:
public static UInt16[] GenerateRandomDataUInt16(int length)
{
UInt16[] noise = new UInt16[length];
Random random = new Random((int)DateTime.Now.Ticks);
for (int i = 0; i < length; ++i)
{
noise[i] = (UInt16)random.Next();
}
return noise;
}
public static int[] GenerateRandomDataInt(int length)
{
int[] noise = new int[length];
Random random = new Random((int)DateTime.Now.Ticks);
for (int i = 0; i < length; ++i)
{
noise[i] = (int)random.Next();
}
return noise;
}
public static int[] AddInt(int[] a, int[] b)
{
int len = a.Length;
int[] result = new int[len];
for (int i = 0; i < len; ++i)
{
result[i] = (int)(a[i] + b[i]);
}
return result;
}
public static UInt16[] AddUInt16(UInt16[] a, UInt16[] b)
{
int len = a.Length;
UInt16[] result = new UInt16[len];
for (int i = 0; i < len; ++i)
{
result[i] = (ushort)(a[i] + b[i]);
}
return result;
}
public static void Main()
{
int count = 1000;
int len = 128 * 6000;
int[] aInt = GenerateRandomDataInt(len);
int[] bInt = GenerateRandomDataInt(len);
Stopwatch s = new Stopwatch();
s.Start();
for (int i=0; i<count; ++i)
{
int[] resultInt = AddInt(aInt, bInt);
}
s.Stop();
Console.WriteLine("Int for " + count
+ " took " + s.ElapsedTicks + " tick ("
+ s.ElapsedMilliseconds + " msec)");
UInt16[] aUInt16 = GenerateRandomDataUInt16(len);
UInt16[] bUInt16 = GenerateRandomDataUInt16(len);
s = new Stopwatch();
s.Start();
for (int i=0; i<count; ++i)
{
UInt16[] resultUInt16 = AddUInt16(aUInt16, bUInt16);
}
s.Stop();
Console.WriteLine("UInt16 for " + count
+ " took " + s.ElapsedTicks + " tick ("
+ s.ElapsedMilliseconds + " msec)");
}
What happens is that you see a leaky abstraction. UInt16 takes half of memory that int does (16 vs. 32 bit).
This means that the memory area occupied by int16 array takes half of area that the int32 does. So more of that area can fit in processor cache and thus be accessed very quickly.
You could try that code on a processor that has more cache and the difference is likely to be smaller.
Also try with much larger arrays.
Arrays are word aligned, but there is no reason why entries in the array should be word aligned.
Couple of factors
1) You are also timing the generation of the resultant array..so it would be interesting to see how much time it took to just to the adding vs creating the result array that gets passed back
2) It would be interesting to see what IL is generated. Since your code is VERY straightforward (iterate and add), the compiler might be optimizing this, maybe stuffing multiple uint16's in a larger register and doing multiple additions per instruction
Just a SWAG: the smaller memory use of the UInt16 arrays has improved memory characteristics (GC, cache, who knows what else). Since there doesn't seem to be too many allocations, I'd guess that cache is the main factor.
Also, you should take care that benchmarking can be a tricky business - it looks like your times are probably including some of the JIT compilation, which might be skewing results. You might try reversing the order that you test the int
array with the UInt16
array and see if the timings follow along or not.
Jon Skeet has (or had) a simple benchmark framework he coded up way back when that tried to take these effects into account. I don't know if it's still available (or even applicable); maybe he'll comment.
I am not expert in .NET but I would check two things:
int
) takes more time then array of N ushort
elements. This can tested using various size of arrays and style of coding -- see my comment to question). Numbers from your tests fit this theory :).ushort
variables can be implemented as adding two int
with result of type int
-- without checking overflowing. And I assume that handling in code any kind of exception (including overflow exception) is time consuming task. This can be checked in .NET documentation.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With