Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

int, short, byte performance in back-to-back for-loops

(background: Why should I use int instead of a byte or short in C#)

To satisfy my own curiosity about the pros and cons of using the "appropriate size" integer vs the "optimized" integer i wrote the following code which reinforced what I previously held true about int performance in .Net (and which is explained in the link above) which is that it is optimized for int performance rather than short or byte.

DateTime t;
long a, b, c;

t = DateTime.Now;
for (int index = 0; index < 127; index++)
{
    Console.WriteLine(index.ToString());
}           
a = DateTime.Now.Ticks - t.Ticks;

t = DateTime.Now;
for (short index = 0; index < 127; index++)
{
    Console.WriteLine(index.ToString());
}
        
b=DateTime.Now.Ticks - t.Ticks;

t = DateTime.Now;           
for (byte index = 0; index < 127; index++)
{
    Console.WriteLine(index.ToString());
}
c=DateTime.Now.Ticks - t.Ticks;

Console.WriteLine(a.ToString());
Console.WriteLine(b.ToString());
Console.WriteLine(c.ToString());

This gives roughly consistent results in the area of...

~950000

~2000000

~1700000

Which is in line with what i would expect to see.

However when I try repeating the loops for each data type like this...

t = DateTime.Now;
for (int index = 0; index < 127; index++)
{
    Console.WriteLine(index.ToString());
}
for (int index = 0; index < 127; index++)
{
    Console.WriteLine(index.ToString());
}
for (int index = 0; index < 127; index++)
{
    Console.WriteLine(index.ToString());
}
a = DateTime.Now.Ticks - t.Ticks;

The numbers are more like...

~4500000

~3100000

~300000

Which I find puzzling. Can anyone offer an explanation?

NOTE: In the interest of comparing like for like i've limited the loops to 127 because of the range of the byte value type. Also this is an act of curiosity not production code micro-optimization.

like image 668
gingerbreadboy Avatar asked Apr 07 '10 16:04

gingerbreadboy


4 Answers

First of all, it's not .NET that's optimized for int performance, it's the machine that's optimized because 32 bits is the native word size (unless you're on x64, in which case it's long or 64 bits).

Second, you're writing to the console inside each loop - that's going too be far more expensive than incrementing and testing the loop counter, so you're not measuring anything realistic here.

Third, a byte has range up to 255, so you can loop 254 times (if you try to do 255 it will overflow and the loop will never end - but you don't need to stop at 128).

Fourth, you're not doing anywhere near enough iterations to profile. Iterating a tight loop 128 or even 254 times is meaningless. What you should be doing is putting the byte/short/int loop inside another loop that iterates a much larger number of times, say 10 million, and check the results of that.

Finally, using DateTime.Now within calculations is going to result in some timing "noise" while profiling. It's recommended (and easier) to use the Stopwatch class instead.

Bottom line, this needs many changes before it can be a valid perf test.


Here's what I'd consider to be a more accurate test program:

class Program
{
    const int TestIterations = 5000000;

    static void Main(string[] args)
    {
        RunTest("Byte Loop", TestByteLoop, TestIterations);
        RunTest("Short Loop", TestShortLoop, TestIterations);
        RunTest("Int Loop", TestIntLoop, TestIterations);
        Console.ReadLine();
    }

    static void RunTest(string testName, Action action, int iterations)
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < iterations; i++)
        {
            action();
        }
        sw.Stop();
        Console.WriteLine("{0}: Elapsed Time = {1}", testName, sw.Elapsed);
    }

    static void TestByteLoop()
    {
        int x = 0;
        for (byte b = 0; b < 255; b++)
            ++x;
    }

    static void TestShortLoop()
    {
        int x = 0;
        for (short s = 0; s < 255; s++)
            ++x;
    }

    static void TestIntLoop()
    {
        int x = 0;
        for (int i = 0; i < 255; i++)
            ++x;
    }
}

This runs each loop inside a much larger loop (5 million iterations) and performs a very simple operation inside the loop (increments a variable). The results for me were:

Byte Loop: Elapsed Time = 00:00:03.8949910
Short Loop: Elapsed Time = 00:00:03.9098782
Int Loop: Elapsed Time = 00:00:03.2986990

So, no appreciable difference.

Also, make sure you profile in release mode, a lot of people forget and test in debug mode, which will be significantly less accurate.

like image 137
Aaronaught Avatar answered Nov 01 '22 05:11

Aaronaught


The majority of this time is probably spent writing to the console. Try doing something other than that in the loop...

Additionally:

  • Using DateTime.Now is a bad way of measuring time. Use System.Diagnostics.Stopwatch instead
  • Once you've got rid of the Console.WriteLine call, a loop of 127 iterations is going to be too short to measure. You need to run the loop lots of times to get a sensible measurement.

Here's my benchmark:

using System;
using System.Diagnostics;

public static class Test
{    
    const int Iterations = 100000;

    static void Main(string[] args)
    {
        Measure(ByteLoop);
        Measure(ShortLoop);
        Measure(IntLoop);
        Measure(BackToBack);
        Measure(DelegateOverhead);
    }

    static void Measure(Action action)
    {
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < Iterations; i++)
        {
            action();
        }
        sw.Stop();
        Console.WriteLine("{0}: {1}ms", action.Method.Name,
                          sw.ElapsedMilliseconds);
    }

    static void ByteLoop()
    {
        for (byte index = 0; index < 127; index++)
        {
            index.ToString();
        }
    }

    static void ShortLoop()
    {
        for (short index = 0; index < 127; index++)
        {
            index.ToString();
        }
    }

    static void IntLoop()
    {
        for (int index = 0; index < 127; index++)
        {
            index.ToString();
        }
    }

    static void BackToBack()
    {
        for (byte index = 0; index < 127; index++)
        {
            index.ToString();
        }
        for (short index = 0; index < 127; index++)
        {
            index.ToString();
        }
        for (int index = 0; index < 127; index++)
        {
            index.ToString();
        }
    }

    static void DelegateOverhead()
    {
        // Nothing. Let's see how much
        // overhead there is just for calling
        // this repeatedly...
    }
}

And the results:

ByteLoop: 6585ms
ShortLoop: 6342ms
IntLoop: 6404ms
BackToBack: 19757ms
DelegateOverhead: 1ms

(This is on a netbook - adjust the number of iterations until you get something sensible :)

That seems to show it making basically no significant different which type you use.

like image 16
Jon Skeet Avatar answered Nov 01 '22 06:11

Jon Skeet


Just out of curiosity I modified a litte the program from Aaronaught and compiled it in both x86 and x64 modes. Strange, Int works much faster in x64:

x86

Byte Loop: Elapsed Time = 00:00:00.8636454
Short Loop: Elapsed Time = 00:00:00.8795518
UShort Loop: Elapsed Time = 00:00:00.8630357
Int Loop: Elapsed Time = 00:00:00.5184154
UInt Loop: Elapsed Time = 00:00:00.4950156
Long Loop: Elapsed Time = 00:00:01.2941183
ULong Loop: Elapsed Time = 00:00:01.3023409

x64

Byte Loop: Elapsed Time = 00:00:01.0646588
Short Loop: Elapsed Time = 00:00:01.0719330
UShort Loop: Elapsed Time = 00:00:01.0711545
Int Loop: Elapsed Time = 00:00:00.2462848
UInt Loop: Elapsed Time = 00:00:00.4708777
Long Loop: Elapsed Time = 00:00:00.5242272
ULong Loop: Elapsed Time = 00:00:00.5144035

like image 7
ialiashkevich Avatar answered Nov 01 '22 04:11

ialiashkevich


I tried out the two programs above as they looked like they would produce different and possibly conflicting results on my dev machine.

Outputs from Aaronaughts' test harness

Short Loop: Elapsed Time = 00:00:00.8299340
Byte Loop: Elapsed Time = 00:00:00.8398556
Int Loop: Elapsed Time = 00:00:00.3217386
Long Loop: Elapsed Time = 00:00:00.7816368

ints are much quicker

Outputs from Jon's

ByteLoop: 1126ms
ShortLoop: 1115ms
IntLoop: 1096ms
BackToBack: 3283ms
DelegateOverhead: 0ms

nothing in it

Jon has the big fixed constant of calling tostring in the results which may be hiding the possible benefits that could occur if the work done in the loop was less. Aaronaught is using a 32bit OS which dosen't seem to benefit from using ints as much as the x64 rig I am using.

Hardware / Software Results were collected on a Core i7 975 at 3.33GHz with turbo disabled and the core affinity set to reduce impact of other tasks. Performance settings all set to maximum and virus scanner / unnecessary background tasks suspended. Windows 7 x64 ultimate with 11 GB of spare ram and very little IO activity. Run in release config built in vs 2008 without a debugger or profiler attached.

Repeatability Originally repeated 10 times changing order of execution for each test. Variation was negligible so i only posted my first result. Under max CPU load the ratio of execution times stayed consistent. Repeat runs on multiple x64 xp xeon blades gives roughly same results after taking into account CPU generation and Ghz

Profiling Redgate / Jetbrains / Slimtune / CLR profiler and my own profiler all indicate that the results are correct.

Debug Build Using the debug settings in VS gives consistent results like Aaronaught's.

like image 4
Steve Avatar answered Nov 01 '22 05:11

Steve