(background: Why should I use int instead of a byte or short in C#)
To satisfy my own curiosity about the pros and cons of using the "appropriate size" integer vs the "optimized" integer i wrote the following code which reinforced what I previously held true about int performance in .Net (and which is explained in the link above) which is that it is optimized for int performance rather than short or byte.
DateTime t;
long a, b, c;
t = DateTime.Now;
for (int index = 0; index < 127; index++)
{
Console.WriteLine(index.ToString());
}
a = DateTime.Now.Ticks - t.Ticks;
t = DateTime.Now;
for (short index = 0; index < 127; index++)
{
Console.WriteLine(index.ToString());
}
b=DateTime.Now.Ticks - t.Ticks;
t = DateTime.Now;
for (byte index = 0; index < 127; index++)
{
Console.WriteLine(index.ToString());
}
c=DateTime.Now.Ticks - t.Ticks;
Console.WriteLine(a.ToString());
Console.WriteLine(b.ToString());
Console.WriteLine(c.ToString());
This gives roughly consistent results in the area of...
~950000
~2000000
~1700000
Which is in line with what i would expect to see.
However when I try repeating the loops for each data type like this...
t = DateTime.Now;
for (int index = 0; index < 127; index++)
{
Console.WriteLine(index.ToString());
}
for (int index = 0; index < 127; index++)
{
Console.WriteLine(index.ToString());
}
for (int index = 0; index < 127; index++)
{
Console.WriteLine(index.ToString());
}
a = DateTime.Now.Ticks - t.Ticks;
The numbers are more like...
~4500000
~3100000
~300000
Which I find puzzling. Can anyone offer an explanation?
NOTE: In the interest of comparing like for like i've limited the loops to 127 because of the range of the byte value type. Also this is an act of curiosity not production code micro-optimization.
First of all, it's not .NET that's optimized for int
performance, it's the machine that's optimized because 32 bits is the native word size (unless you're on x64, in which case it's long
or 64 bits).
Second, you're writing to the console inside each loop - that's going too be far more expensive than incrementing and testing the loop counter, so you're not measuring anything realistic here.
Third, a byte
has range up to 255, so you can loop 254 times (if you try to do 255 it will overflow and the loop will never end - but you don't need to stop at 128).
Fourth, you're not doing anywhere near enough iterations to profile. Iterating a tight loop 128 or even 254 times is meaningless. What you should be doing is putting the byte
/short
/int
loop inside another loop that iterates a much larger number of times, say 10 million, and check the results of that.
Finally, using DateTime.Now
within calculations is going to result in some timing "noise" while profiling. It's recommended (and easier) to use the Stopwatch class instead.
Bottom line, this needs many changes before it can be a valid perf test.
Here's what I'd consider to be a more accurate test program:
class Program
{
const int TestIterations = 5000000;
static void Main(string[] args)
{
RunTest("Byte Loop", TestByteLoop, TestIterations);
RunTest("Short Loop", TestShortLoop, TestIterations);
RunTest("Int Loop", TestIntLoop, TestIterations);
Console.ReadLine();
}
static void RunTest(string testName, Action action, int iterations)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < iterations; i++)
{
action();
}
sw.Stop();
Console.WriteLine("{0}: Elapsed Time = {1}", testName, sw.Elapsed);
}
static void TestByteLoop()
{
int x = 0;
for (byte b = 0; b < 255; b++)
++x;
}
static void TestShortLoop()
{
int x = 0;
for (short s = 0; s < 255; s++)
++x;
}
static void TestIntLoop()
{
int x = 0;
for (int i = 0; i < 255; i++)
++x;
}
}
This runs each loop inside a much larger loop (5 million iterations) and performs a very simple operation inside the loop (increments a variable). The results for me were:
Byte Loop: Elapsed Time = 00:00:03.8949910
Short Loop: Elapsed Time = 00:00:03.9098782
Int Loop: Elapsed Time = 00:00:03.2986990
So, no appreciable difference.
Also, make sure you profile in release mode, a lot of people forget and test in debug mode, which will be significantly less accurate.
The majority of this time is probably spent writing to the console. Try doing something other than that in the loop...
Additionally:
DateTime.Now
is a bad way of measuring time. Use System.Diagnostics.Stopwatch
insteadConsole.WriteLine
call, a loop of 127 iterations is going to be too short to measure. You need to run the loop lots of times to get a sensible measurement.Here's my benchmark:
using System;
using System.Diagnostics;
public static class Test
{
const int Iterations = 100000;
static void Main(string[] args)
{
Measure(ByteLoop);
Measure(ShortLoop);
Measure(IntLoop);
Measure(BackToBack);
Measure(DelegateOverhead);
}
static void Measure(Action action)
{
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
action();
}
sw.Stop();
Console.WriteLine("{0}: {1}ms", action.Method.Name,
sw.ElapsedMilliseconds);
}
static void ByteLoop()
{
for (byte index = 0; index < 127; index++)
{
index.ToString();
}
}
static void ShortLoop()
{
for (short index = 0; index < 127; index++)
{
index.ToString();
}
}
static void IntLoop()
{
for (int index = 0; index < 127; index++)
{
index.ToString();
}
}
static void BackToBack()
{
for (byte index = 0; index < 127; index++)
{
index.ToString();
}
for (short index = 0; index < 127; index++)
{
index.ToString();
}
for (int index = 0; index < 127; index++)
{
index.ToString();
}
}
static void DelegateOverhead()
{
// Nothing. Let's see how much
// overhead there is just for calling
// this repeatedly...
}
}
And the results:
ByteLoop: 6585ms
ShortLoop: 6342ms
IntLoop: 6404ms
BackToBack: 19757ms
DelegateOverhead: 1ms
(This is on a netbook - adjust the number of iterations until you get something sensible :)
That seems to show it making basically no significant different which type you use.
Just out of curiosity I modified a litte the program from Aaronaught and compiled it in both x86 and x64 modes. Strange, Int works much faster in x64:
x86
Byte Loop: Elapsed Time = 00:00:00.8636454
Short Loop: Elapsed Time = 00:00:00.8795518
UShort Loop: Elapsed Time = 00:00:00.8630357
Int Loop: Elapsed Time = 00:00:00.5184154
UInt Loop: Elapsed Time = 00:00:00.4950156
Long Loop: Elapsed Time = 00:00:01.2941183
ULong Loop: Elapsed Time = 00:00:01.3023409
x64
Byte Loop: Elapsed Time = 00:00:01.0646588
Short Loop: Elapsed Time = 00:00:01.0719330
UShort Loop: Elapsed Time = 00:00:01.0711545
Int Loop: Elapsed Time = 00:00:00.2462848
UInt Loop: Elapsed Time = 00:00:00.4708777
Long Loop: Elapsed Time = 00:00:00.5242272
ULong Loop: Elapsed Time = 00:00:00.5144035
I tried out the two programs above as they looked like they would produce different and possibly conflicting results on my dev machine.
Outputs from Aaronaughts' test harness
Short Loop: Elapsed Time = 00:00:00.8299340
Byte Loop: Elapsed Time = 00:00:00.8398556
Int Loop: Elapsed Time = 00:00:00.3217386
Long Loop: Elapsed Time = 00:00:00.7816368
ints are much quicker
Outputs from Jon's
ByteLoop: 1126ms
ShortLoop: 1115ms
IntLoop: 1096ms
BackToBack: 3283ms
DelegateOverhead: 0ms
nothing in it
Jon has the big fixed constant of calling tostring in the results which may be hiding the possible benefits that could occur if the work done in the loop was less. Aaronaught is using a 32bit OS which dosen't seem to benefit from using ints as much as the x64 rig I am using.
Hardware / Software Results were collected on a Core i7 975 at 3.33GHz with turbo disabled and the core affinity set to reduce impact of other tasks. Performance settings all set to maximum and virus scanner / unnecessary background tasks suspended. Windows 7 x64 ultimate with 11 GB of spare ram and very little IO activity. Run in release config built in vs 2008 without a debugger or profiler attached.
Repeatability Originally repeated 10 times changing order of execution for each test. Variation was negligible so i only posted my first result. Under max CPU load the ratio of execution times stayed consistent. Repeat runs on multiple x64 xp xeon blades gives roughly same results after taking into account CPU generation and Ghz
Profiling Redgate / Jetbrains / Slimtune / CLR profiler and my own profiler all indicate that the results are correct.
Debug Build Using the debug settings in VS gives consistent results like Aaronaught's.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With