Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

<= does work slower than <

Tags:

performance

c#

I found a few questions on SO regarding performance comparison of < and <= (this one was extremely downvoted) and I always found the same answer that there is no performance difference between the two.

I wrote a program for the comparison (not so working fiddle...copy to your machine to run it) in which I created two loops for (int i = 0; i <= 1000000000; i++ ) and for (int i = 0; i < 1000000001; i++ ) in two different methods.

I ran each method 100 times; took an average of the elapsed time and found that loop with <= operator ran slower than the one with < operator.

I ran the program multiple times and <= always took more time to complete. My results(im ms) were:

3018.73, 2778.22

2816.87, 2760.62

2859.02, 2797.05

My question is: If neither one is faster, why do I see differences in the results? Is there anything wrong with my program?

like image 671
Dumbledore Avatar asked Apr 17 '15 14:04

Dumbledore


2 Answers

Bench-marking is a fine art. What you describe is not physically possible, the <= and < operators just generate different processor instructions that execute at the exact same speed. I altered your program slightly, running DoIt ten times and dropping two zeros from the for() loop so I wouldn't have to wait for ever:

x86 jitter:

Less Than Equal To Method Time Elapsed: 0.5
Less Than Method Time Elapsed: 0.42
Less Than Equal To Method Time Elapsed: 0.36
Less Than Method Time Elapsed: 0.46
Less Than Equal To Method Time Elapsed: 0.4
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.33
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.35
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.31
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.31
Less Than Method Time Elapsed: 0.32

x64 jitter:

Less Than Equal To Method Time Elapsed: 0.44
Less Than Method Time Elapsed: 0.4
Less Than Equal To Method Time Elapsed: 0.44
Less Than Method Time Elapsed: 0.45
Less Than Equal To Method Time Elapsed: 0.36
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.38
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.33
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.42
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.31
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.35

The only real signal you get from this is the slow execution of the first DoIt(), also visible in your test results, that's jitter overhead. And the most important signal, it is noisy. The median value for both loops is about equal, the standard deviation is rather large.

Otherwise the kind of signal you always get when you micro-optimize, execution of code is not very deterministic. Short from .NET runtime overhead that's usually easy to eliminate, your program is not the only one that runs on your machine. It has to share the processor, just the WriteLine() call already has an affect. Executed by the conhost.exe process, runs concurrently with your test while your test code entered the next for() loop. And everything else that happens on your machine, kernel code and interrupt handlers also get their turn.

And codegen can play a role, one thing you should do for example is just swap the two calls. The processor itself is in general executes code very non-deterministically. The state of the processor caches and how much historical data was gathered by the branch prediction logic matters a great deal.

When I benchmark, I consider differences of 15% or less not statistically significant. Hunting down differences less than that is quite difficult, you have to very carefully study the machine code. Silly things like a branch target being mis-aligned or a variable not getting stored in a processor register can cause big effects in execution time. Not something you can ever fix, the jitter does not have enough knobs to tweak.

like image 137
Hans Passant Avatar answered Nov 03 '22 17:11

Hans Passant


First of all, there are many, many reasons to see variations in benchmarks, even when they're done right. Here are a few that come to mind:

  • Your computer is running a lot of other processes at the same time, switching things in and out of context, and so on. The operating system is constantly receiving and handling interrupts from various I/O devices, etc. All of these things can cause the computer to pause for periods of time that dwarf the running time for the actual code you're testing.
  • The JIT process can detect when a function has run a certain number of times, and apply additional optimizations to it based on that information. Things like loop unrolling can drastically reduce the number of jumps that the program has to make, which are significantly more expensive than typical CPU operations. Re-optimizing the instructions takes time when it first happens, and then speeds things up after that point.
  • Your hardware is trying to make additional optimizations, like branch prediction, to ensure that its pipeline is being used as efficiently as possible. (If it guesses correctly, it can basically pretend that it's going to do the i++ while it waits to see whether the < or <= comparison finishes, and then discard the result if it finds out it was wrong.) The impact of these optimizations depends on a lot of factors, and is not really easy to predict.

Secondly, it's actually really difficult to do benchmarking well. Here'a benchmark template that I've been using for a while now. It's not perfect, but it's pretty good at ensuring that any emerging patterns are unlikely to be based on order of execution or random chance:

/* This is a benchmarking template I use in LINQPad when I want to do a
 * quick performance test. Just give it a couple of actions to test and
 * it will give you a pretty good idea of how long they take compared
 * to one another. It's not perfect: You can expect a 3% error margin
 * under ideal circumstances. But if you're not going to improve
 * performance by more than 3%, you probably don't care anyway.*/
void Main()
{
    // Enter setup code here
    var actions = new[]
    {
        new TimedAction("control", () =>
        {
            int i = 0;
        }),
        new TimedAction("<", () =>
        {
           for (int i = 0; i < 1000001; i++)
            {}
        }),
        new TimedAction("<=", () =>
        {
           for (int i = 0; i <= 1000000; i++)
            {}
        }),
        new TimedAction(">", () =>
        {
           for (int i = 1000001; i > 0; i--)
            {}
        }),
        new TimedAction(">=", () =>
        {
           for (int i = 1000000; i >= 0; i--)
            {}
        })
    };
    const int TimesToRun = 10000; // Tweak this as necessary
    TimeActions(TimesToRun, actions);
}


#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
    Stopwatch s = new Stopwatch();
    int length = actions.Length;
    var results = new ActionResult[actions.Length];
    // Perform the actions in their initial order.
    for(int i = 0; i < length; i++)
    {
        var action = actions[i];
        var result = results[i] = new ActionResult{Message = action.Message};
        // Do a dry run to get things ramped up/cached
        result.DryRun1 = s.Time(action.Action, 10);
        result.FullRun1 = s.Time(action.Action, iterations);
    }
    // Perform the actions in reverse order.
    for(int i = length - 1; i >= 0; i--)
    {
        var action = actions[i];
        var result = results[i];
        // Do a dry run to get things ramped up/cached
        result.DryRun2 = s.Time(action.Action, 10);
        result.FullRun2 = s.Time(action.Action, iterations);
    }
    results.Dump();
}

public class ActionResult
{
    public string Message {get;set;}
    public double DryRun1 {get;set;}
    public double DryRun2 {get;set;}
    public double FullRun1 {get;set;}
    public double FullRun2 {get;set;}
}

public class TimedAction
{
    public TimedAction(string message, Action action)
    {
        Message = message;
        Action = action;
    }
    public string Message {get;private set;}
    public Action Action {get;private set;}
}

public static class StopwatchExtensions
{
    public static double Time(this Stopwatch sw, Action action, int iterations)
    {
        sw.Restart();
        for (int i = 0; i < iterations; i++)
        {
            action();
        }
        sw.Stop();

        return sw.Elapsed.TotalMilliseconds;
    }
}
#endregion

Here's the result I get when running this in LINQPad:

benchmark result

So you'll notice that there is some variation, particularly early on, but after running everything backwards and forwards enough times, there isn't a clear pattern emerging to show that one way is much faster or slower than another.

like image 32
StriplingWarrior Avatar answered Nov 03 '22 18:11

StriplingWarrior