Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do UInt16 arrays seem to add faster than int arrays?

Tags:

arrays

c#

.net

clr

It seems that C# is faster at adding two arrays of UInt16[] than it is at adding two arrays of int[]. This makes no sense to me, since I would have assumed the arrays would be word-aligned, and thus int[] would require less work from the CPU, no?

I ran the test-code below, and got the following results:

Int    for 1000 took 9896625613 tick (4227 msec)
UInt16 for 1000 took 6297688551 tick (2689 msec)

The test code does the following:

  1. Creates two arrays named a and b, once.
  2. Fills them with random data, once.
  3. Starts a stopwatch.
  4. Adds a and b, item-by-item. This is done 1000 times.
  5. Stops the stopwatch.
  6. Reports how long it took.

This is done for int[] a, b and for UInt16 a,b. And every time I run the code, the tests for the UInt16 arrays take 30%-50% less time than the int arrays. Can you explain this to me?

Here's the code, if you want to try if for yourself:

public static UInt16[] GenerateRandomDataUInt16(int length)
{
    UInt16[] noise = new UInt16[length];
    Random random = new Random((int)DateTime.Now.Ticks);
    for (int i = 0; i < length; ++i)
    {
        noise[i] = (UInt16)random.Next();
    }

    return noise;
}

public static int[] GenerateRandomDataInt(int length)
{
    int[] noise = new int[length];
    Random random = new Random((int)DateTime.Now.Ticks);
    for (int i = 0; i < length; ++i)
    {
        noise[i] = (int)random.Next();
    }

    return noise;
}

public static int[] AddInt(int[] a, int[] b)
{
    int len = a.Length;
    int[] result = new int[len];
    for (int i = 0; i < len; ++i)
    {
        result[i] = (int)(a[i] + b[i]);
    }
    return result;
}

public static UInt16[] AddUInt16(UInt16[] a, UInt16[] b)
{
    int len = a.Length;
    UInt16[] result = new UInt16[len];
    for (int i = 0; i < len; ++i)
    {
        result[i] = (ushort)(a[i] + b[i]);
    }
    return result;
}


public static void Main()
{
    int count = 1000;
    int len = 128 * 6000;

    int[] aInt = GenerateRandomDataInt(len);
    int[] bInt = GenerateRandomDataInt(len);

    Stopwatch s = new Stopwatch();
    s.Start();
    for (int i=0; i<count; ++i) 
    {
        int[] resultInt = AddInt(aInt, bInt);
    }
    s.Stop();
    Console.WriteLine("Int    for " + count 
                + " took " + s.ElapsedTicks + " tick (" 
                + s.ElapsedMilliseconds + " msec)");

    UInt16[] aUInt16 = GenerateRandomDataUInt16(len);
    UInt16[] bUInt16 = GenerateRandomDataUInt16(len);

    s = new Stopwatch();
    s.Start();
    for (int i=0; i<count; ++i) 
    {
        UInt16[] resultUInt16 = AddUInt16(aUInt16, bUInt16);
    }
    s.Stop();
    Console.WriteLine("UInt16 for " + count 
                + " took " + s.ElapsedTicks + " tick (" 
                + s.ElapsedMilliseconds + " msec)");


}
like image 691
Shalom Craimer Avatar asked Apr 13 '10 07:04

Shalom Craimer


5 Answers

What happens is that you see a leaky abstraction. UInt16 takes half of memory that int does (16 vs. 32 bit).

This means that the memory area occupied by int16 array takes half of area that the int32 does. So more of that area can fit in processor cache and thus be accessed very quickly.

You could try that code on a processor that has more cache and the difference is likely to be smaller.

Also try with much larger arrays.

like image 200
Pasi Savolainen Avatar answered Nov 15 '22 18:11

Pasi Savolainen


Arrays are word aligned, but there is no reason why entries in the array should be word aligned.

like image 33
Dipstick Avatar answered Nov 15 '22 16:11

Dipstick


Couple of factors

1) You are also timing the generation of the resultant array..so it would be interesting to see how much time it took to just to the adding vs creating the result array that gets passed back

2) It would be interesting to see what IL is generated. Since your code is VERY straightforward (iterate and add), the compiler might be optimizing this, maybe stuffing multiple uint16's in a larger register and doing multiple additions per instruction

like image 33
puffpio Avatar answered Nov 15 '22 18:11

puffpio


Just a SWAG: the smaller memory use of the UInt16 arrays has improved memory characteristics (GC, cache, who knows what else). Since there doesn't seem to be too many allocations, I'd guess that cache is the main factor.

Also, you should take care that benchmarking can be a tricky business - it looks like your times are probably including some of the JIT compilation, which might be skewing results. You might try reversing the order that you test the int array with the UInt16 array and see if the timings follow along or not.

Jon Skeet has (or had) a simple benchmark framework he coded up way back when that tried to take these effects into account. I don't know if it's still available (or even applicable); maybe he'll comment.

like image 26
Michael Burr Avatar answered Nov 15 '22 16:11

Michael Burr


I am not expert in .NET but I would check two things:

  1. Passing larger array (N elements of type int) takes more time then array of N ushort elements. This can tested using various size of arrays and style of coding -- see my comment to question). Numbers from your tests fit this theory :).
  2. Adding two ushort variables can be implemented as adding two int with result of type int -- without checking overflowing. And I assume that handling in code any kind of exception (including overflow exception) is time consuming task. This can be checked in .NET documentation.
like image 28
Grzegorz Gierlik Avatar answered Nov 15 '22 17:11

Grzegorz Gierlik