The following short but complete example program
const long iterations = 1000000000;
T[] array = new T[1 << 20];
for (int i = 0; i < array.Length; i++)
{
array[i] = new T();
}
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
array[i % array.Length].Value0 = i;
}
Console.WriteLine("{0,-15} {1} {2:n0} iterations/s",
typeof(T).Name, sw.Elapsed, iterations * 1000d / sw.ElapsedMilliseconds);
with T
replaced by the following types
class SimpleClass struct SimpleStruct
{ {
public int Value0; public int Value0;
} }
class ComplexClass struct ComplexStruct
{ {
public int Value0; public int Value0;
public int Value1; public int Value1;
public int Value2; public int Value2;
public int Value3; public int Value3;
public int Value4; public int Value4;
public int Value5; public int Value5;
public int Value6; public int Value6;
public int Value7; public int Value7;
public int Value8; public int Value8;
public int Value9; public int Value9;
public int Value10; public int Value10;
public int Value11; public int Value11;
} }
yields the following interesting results on my machine (Windows 7 .NET 4.5 32-bit)
SimpleClass 00:00:10.4471717 95,721,260 iterations/s ComplexClass 00:00:37.8199150 26,441,736 iterations/s SimpleStruct 00:00:12.3075100 81,254,571 iterations/s ComplexStruct 00:00:32.6140182 30,661,679 iterations/s
Question 1: Why is ComplexClass
so much slower than SimpleClass
? The elapsed time seems to increase linearly with the number of fields in the class. Writing to the first field of a class with a lot of fields shouldn't be very different from writing to the first field of a class with only one field, no?
Question 2: Why is ComplexStruct
slower than SimpleStruct
? A look at the IL code shows that i
is written directly to the array, not to a local instance of ComplexStruct
that is then copied into the array. So there should be no overhead caused by copying more fields.
Bonus question: Why is ComplexStruct
faster than ComplexClass
?
Edit: Updated test results with a smaller array, T[] array = new T[1 << 8];
:
SimpleClass 00:00:13.5091446 74,024,724 iterations/s ComplexClass 00:00:13.2505217 75,471,698 iterations/s SimpleStruct 00:00:14.8397693 67,389,986 iterations/s ComplexStruct 00:00:13.4821834 74,172,971 iterations/s
So virtually no difference between SimpleClass
and ComplexClass
, and only a small difference between SimpleStruct
and ComplexStruct
. However, the performance significantly decreased for SimpleClass
and SimpleStruct
.
Edit: And now with T[] array = new T[1 << 16];
:
SimpleClass 00:00:09.7477715 102,595,670 iterations/s ComplexClass 00:00:10.1279081 98,745,927 iterations/s SimpleStruct 00:00:12.1539631 82,284,210 iterations/s ComplexStruct 00:00:10.5914174 94,419,790 iterations/s
The result for 1<<15
is like 1<<8
, and the result for 1<<17
is like 1<<20
.
Possible answer to Question 1:
Your CPU reads memory into its cache a page at a time.
With the larger data type, you can fit fewer objects onto each cache page. Even though you're only writing one 32-bit value, you still need the page in CPU cache. With the smaller objects, you can get through more loops before you next need to read from main memory.
I have no documentation to prove it, but I suppose that it could be a matter of locality. Being the complex classes wider in terms of memory, it'd take longer for the kernel to access distant areas of memory, on the heap or on the stack. To be objective, though, I have to say that the difference between your measures sounds really high for the problem to be the system's fault.
About the difference between classes and structures, I cannot document this either, but it may be because, for the same principle as before, the stack is cached more often than heap regions, leading to less cache misses.
Have you ran the program with active optimizations?
EDIT: I've made a small test on ComplexStruct
and used the StructLayoutAttribute
with LayoutKind.Explicit
as the parameter, then added a FieldOffsetAttribute
with 0 as a parameter to each field of the structure. The times were cut significantly, and I think they were approximately the same as the SimpleStruct
's ones. I ran it in Debug mode, debugger on, no optimizations. While the struct retained its fields, its size in memory was cut and so were the times.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With