Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Peculiar result relating to struct size and performance





I was curious on the overhead of a large structure vs. a small structure in using operators + and * for math. So I made two struct, one Small with 1 double field (8 bytes) and one Big with 10 doubles (80 bytes). In all my operations I only manipulate one field called x.

First I defined in both structures mathematical operators like

public static Small operator +(Small a, Small b)
    return new Small(a.x + b.x);
public static Small operator *(double x, Small a)
    return new Small(x * a.x);

which as expected use up a lot of memory in the stack for copying fields around. I run 5,000,000 iterations of a mathematical operation and got what I suspected (3 times slowdown).

public double TestSmall()
    pt.Start(); // pt = performance timing object
    Small r = new Small(rnd.NextDouble()); //rnd = Random number generator
    for (int i = 0; i < N; i++)
        a = 0.6 * a + 0.4 * r;   // a is a local field of type Small
    return pt.ElapsedSeconds;

results from Release code (in seconds)

Small=0.33940 Big=0.98909     Big is Slower by x2.91

Now for the interesting part. I define the same operations with static methods with ref arguments

public static void Add(ref Small a, ref Small b, ref Small res)
    res.x = a.x + b.x;
public static void Scale(double x, ref Small a, ref Small res)
    res.x = x * a.x;

and run the same number of iterations on this test code:

public double TestSmall2()
    pt.Start(); // pt = performance timing object
    Small a1 = new Small(); // local
    Small a2 = new Small(); // local
    Small r = new Small(rnd.NextDouble()); //rdn = Random number generator
    for (int i = 0; i < N; i++)
        Small.Scale(0.6, ref a, ref a1);
        Small.Scale(0.4, ref r, ref a2);
        Small.Add(ref a1, ref a2, ref a);
    return pt.ElapsedSeconds;

And the results show (in seconds)

Small=0.11765 Big=0.07130     Big is Slower by x0.61

So compared to the mem-copy intensive operators I get a speedup of x3 and x14 which is great, but compare the Small struct times to the Big and you will see that Small is 60% slower than Big.

Can anyone explain this? Does it have to do with CPU pipeline and separating out operations in (spatially) memory makes for more efficient pre-fetch of data?

If you want to try this for yourself grab the code from my dropbox http://dl.dropbox.com/u/11487099/SmallBigCompare.zip

like image 483
John Alexiou Avatar asked Sep 27 '10 18:09

John Alexiou

2 Answers

There appear to be a couple of flaws in your benchmark.

  1. Use Stopwatch instead of the PerformanceTimer type. I'm not familiar with the latter and it appears to be a 3rd party component. It's particularly troubling that it's measuring time in EllapsedSeconds instead of EllapsedMilliseconds.
  2. Should run each test twice and count only the second to eliminate potential JIT costs
  3. Marshal.SizeOf is does not produce the actual size of the struct, just it's marshalling size.

After switching to Stopwatch I see the benchmark performing as expected by producing nearly equal times for both types in the static ref case.

like image 169
JaredPar Avatar answered Oct 29 '22 16:10


I can't reproduce your results. On my box, the "ref" version has basically the same performance for Big and Small, within tolerance.

(Running Release mode without the debugger attached, with 10 or 100 times as many iterations just to try to get a nice long run.)

Have you tried running your version for lots of iterations? Is it possible that while the tests are running, your CPU is gradually increasing its clock speed (as it spots that it's having to work hard)?

like image 39
Jon Skeet Avatar answered Oct 29 '22 16:10

Jon Skeet