I was curious on the overhead of a large structure vs. a small structure in using operators +
and *
for math. So I made two struct, one Small
with 1 double field (8 bytes) and one Big
with 10 doubles (80 bytes). In all my operations I only manipulate one field called x
.
First I defined in both structures mathematical operators like
public static Small operator +(Small a, Small b)
{
return new Small(a.x + b.x);
}
public static Small operator *(double x, Small a)
{
return new Small(x * a.x);
}
which as expected use up a lot of memory in the stack for copying fields around. I run 5,000,000 iterations of a mathematical operation and got what I suspected (3 times slowdown).
public double TestSmall()
{
pt.Start(); // pt = performance timing object
Small r = new Small(rnd.NextDouble()); //rnd = Random number generator
for (int i = 0; i < N; i++)
{
a = 0.6 * a + 0.4 * r; // a is a local field of type Small
}
pt.Stop();
return pt.ElapsedSeconds;
}
results from Release code (in seconds)
Small=0.33940 Big=0.98909 Big is Slower by x2.91
Now for the interesting part. I define the same operations with static methods with ref
arguments
public static void Add(ref Small a, ref Small b, ref Small res)
{
res.x = a.x + b.x;
}
public static void Scale(double x, ref Small a, ref Small res)
{
res.x = x * a.x;
}
and run the same number of iterations on this test code:
public double TestSmall2()
{
pt.Start(); // pt = performance timing object
Small a1 = new Small(); // local
Small a2 = new Small(); // local
Small r = new Small(rnd.NextDouble()); //rdn = Random number generator
for (int i = 0; i < N; i++)
{
Small.Scale(0.6, ref a, ref a1);
Small.Scale(0.4, ref r, ref a2);
Small.Add(ref a1, ref a2, ref a);
}
pt.Stop();
return pt.ElapsedSeconds;
}
And the results show (in seconds)
Small=0.11765 Big=0.07130 Big is Slower by x0.61
So compared to the mem-copy intensive operators I get a speedup of x3 and x14 which is great, but compare the Small struct times to the Big and you will see that Small is 60% slower than Big.
Can anyone explain this? Does it have to do with CPU pipeline and separating out operations in (spatially) memory makes for more efficient pre-fetch of data?
If you want to try this for yourself grab the code from my dropbox http://dl.dropbox.com/u/11487099/SmallBigCompare.zip
There appear to be a couple of flaws in your benchmark.
Stopwatch
instead of the PerformanceTimer
type. I'm not familiar with the latter and it appears to be a 3rd party component. It's particularly troubling that it's measuring time in EllapsedSeconds
instead of EllapsedMilliseconds
.Marshal.SizeOf
is does not produce the actual size of the struct, just it's marshalling size. After switching to Stopwatch
I see the benchmark performing as expected by producing nearly equal times for both types in the static ref case.
I can't reproduce your results. On my box, the "ref" version has basically the same performance for Big
and Small
, within tolerance.
(Running Release mode without the debugger attached, with 10 or 100 times as many iterations just to try to get a nice long run.)
Have you tried running your version for lots of iterations? Is it possible that while the tests are running, your CPU is gradually increasing its clock speed (as it spots that it's having to work hard)?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With