Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C#: Is this benchmarking class accurate?

I created a simple class to benchmark some methods of mine. But is it accurate? I am kind of new to benchmarking, timing, et cetera, so thought I could ask for some feedback here. Also, if it is good, maybe somebody else can make use of it as well :)

public static class Benchmark
{
    public static IEnumerable<long> This(Action subject)
    {
        var watch = new Stopwatch();
        while (true)
        {
            watch.Reset();
            watch.Start();
            subject();
            watch.Stop();
            yield return watch.ElapsedTicks;
        }
    }
}

You can use it like this:

var avg = Benchmark.This(() => SomeMethod()).Take(500).Average();

Any feedback? Does it look to be pretty stable and accurate, or have I missed something?

like image 482
Svish Avatar asked Oct 02 '09 01:10

Svish


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.

What is C language?

C is an imperative procedural language supporting structured programming, lexical variable scope, and recursion, with a static type system. It was designed to be compiled to provide low-level access to memory and language constructs that map efficiently to machine instructions, all with minimal runtime support.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.


3 Answers

It is about as accurate as you can get for a simple benchmark. But there are some factors not under your control:

  • load on the system from other processes
  • state of the heap before/during the benchmark

You could do something about that last point, a benchmark is one of the rare situations where calling GC.Collect can be defended. And you might call subject once beforehand to eliminate any JIT issues. But that requires calls to subject to be independent.

public static IEnumerable<TimeSpan> This(Action subject)
{
    subject();     // warm up
    GC.Collect();  // compact Heap
    GC.WaitForPendingFinalizers(); // and wait for the finalizer queue to empty

    var watch = new Stopwatch();
    while (true)
    {
        watch.Reset();
        watch.Start();
        subject();
        watch.Stop();
        yield return watch.Elapsed;  // TimeSpan
    }
}

For bonus, your class should check the System.Diagnostics.Stopwatch.IsHighResolution field. If it is off, you only have a very coarse (20 ms) resolution.

But on an ordinary PC, with many services running in the background, it is never going to be very accurate.

like image 67
Henk Holterman Avatar answered Oct 16 '22 17:10

Henk Holterman


Couple problems here.

First, remember that the first time you run the code, the transitive closure of its method calls will be jitted. That means that the first run is likely to have higher cost than every subsequent run. Depending on whether you are benchmarking "cold" timings or "hot" timings, this could make a difference. I have seen methods where the cost of jitting the method was higher than every other call to it put together!

Second, remember that the garbage collector runs on another thread. If you are making garbage in one run, then the cost of cleaning up that garbage might not be realized until suebsequent runs. You are therefore failing to account for the total cost of one run, by foisting it off onto later runs.

Both of these are indicative of the weakness of all benchmarking: benchmarking is by nature unrealistic, and therefore of limited value. In real-world code, the GC is going to be running, the jitter is going to be running, and so on. It is frequently the case that benchmarked performance is nothing at all like real-world performance because the benchmark does not take into account the variability of real-world costs inherent in a large system. Rather than analyzing perf characteristics in isolation, I prefer to look at the perf characteristics of realistic scenarios actually faced by real customers.

like image 20
Eric Lippert Avatar answered Oct 16 '22 18:10

Eric Lippert


You should definitely return ElapsedMilliseconds instead of ElapsedTicks. The value returned by ElapsedTicks is dependent upon the Stopwatch frequency, which can be different on different systems. It will not necessarily correspond to the Ticks property of a Timespan or DateTime object.

See http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.elapsedticks.aspx.

If you do want the extra resolution of Ticks, you should return watch.Elapsed.Ticks (i.e. Timestamp.Ticks) instead of watch.ElapsedTicks (this might be one of the subtlest potential errors in .Net). From MSDN:

Stopwatch ticks are different from DateTime.Ticks. Each tick in the DateTime.Ticks value represents one 100-nanosecond interval. Each tick in the ElapsedTicks value represents the time interval equal to 1 second divided by the Frequency.

Other than that, I guess your code is fine, although I think you'd be including some of the method-calling overhead in your measurements, which might be significant if the methods themselves take very little time to execute. Also, you probably would want to exclude the first call to the method from your calculated average, but I'm not sure how you'd do that in your class.

One last point, which would probably not be relevant to most uses of this class: Stopwatch runs a bit fast compared to the system time. On my computer, it gets about 5 seconds (that's seconds, not milliseconds) ahead after 24 hours, and on other machines this drift can be even larger. So it's a little misleading to say it's highly accurate, when it's actually just highly granular. For timing short-duration methods, this obviously wouldn't be a significant problem.

And one more last point, which certainly is relevant: I've often noticed while benchmarking that I'll get a bunch of running times that are all clustered within a narrow range of values (e.g. 80, 80, 79, 82 etc.), but occasionally something else will happen in Windows (like opening another program or my anti-virus kicks on or something) and I'll get a value wildly out of whack with the others (e.g. 80, 80, 79, 271, 80 etc.). I think a simple solution to this outlier problem is to use the median of your measurements instead of the mean. I don't know if Linq supports this automatically or not.

like image 7
MusiGenesis Avatar answered Oct 16 '22 17:10

MusiGenesis