Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Field access through array is slower for types with more fields

The following short but complete example program

const long iterations = 1000000000;

T[] array = new T[1 << 20];
for (int i = 0; i < array.Length; i++)
{
    array[i] = new T();
}

Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
    array[i % array.Length].Value0 = i;
}

Console.WriteLine("{0,-15}  {1}   {2:n0} iterations/s",
    typeof(T).Name, sw.Elapsed, iterations * 1000d / sw.ElapsedMilliseconds);

with T replaced by the following types

class SimpleClass                   struct SimpleStruct
{                                   {
    public int Value0;                  public int Value0;
}                                   }

class ComplexClass                  struct ComplexStruct
{                                   {
    public int Value0;                  public int Value0;
    public int Value1;                  public int Value1;
    public int Value2;                  public int Value2;
    public int Value3;                  public int Value3;
    public int Value4;                  public int Value4;
    public int Value5;                  public int Value5;
    public int Value6;                  public int Value6;
    public int Value7;                  public int Value7;
    public int Value8;                  public int Value8;
    public int Value9;                  public int Value9;
    public int Value10;                 public int Value10;
    public int Value11;                 public int Value11;
}                                   }

yields the following interesting results on my machine (Windows 7 .NET 4.5 32-bit)

SimpleClass      00:00:10.4471717   95,721,260 iterations/s
ComplexClass     00:00:37.8199150   26,441,736 iterations/s
SimpleStruct     00:00:12.3075100   81,254,571 iterations/s
ComplexStruct    00:00:32.6140182   30,661,679 iterations/s

Question 1: Why is ComplexClass so much slower than SimpleClass? The elapsed time seems to increase linearly with the number of fields in the class. Writing to the first field of a class with a lot of fields shouldn't be very different from writing to the first field of a class with only one field, no?

Question 2: Why is ComplexStruct slower than SimpleStruct? A look at the IL code shows that i is written directly to the array, not to a local instance of ComplexStruct that is then copied into the array. So there should be no overhead caused by copying more fields.

Bonus question: Why is ComplexStruct faster than ComplexClass?


Edit: Updated test results with a smaller array, T[] array = new T[1 << 8];:

SimpleClass      00:00:13.5091446   74,024,724 iterations/s
ComplexClass     00:00:13.2505217   75,471,698 iterations/s
SimpleStruct     00:00:14.8397693   67,389,986 iterations/s
ComplexStruct    00:00:13.4821834   74,172,971 iterations/s

So virtually no difference between SimpleClass and ComplexClass, and only a small difference between SimpleStruct and ComplexStruct. However, the performance significantly decreased for SimpleClass and SimpleStruct.


Edit: And now with T[] array = new T[1 << 16];:

SimpleClass      00:00:09.7477715  102,595,670 iterations/s
ComplexClass     00:00:10.1279081  98,745,927 iterations/s
SimpleStruct     00:00:12.1539631  82,284,210 iterations/s
ComplexStruct    00:00:10.5914174  94,419,790 iterations/s

The result for 1<<15 is like 1<<8, and the result for 1<<17 is like 1<<20.

like image 534
dtb Avatar asked Dec 14 '12 23:12

dtb


2 Answers

Possible answer to Question 1:

Your CPU reads memory into its cache a page at a time.

With the larger data type, you can fit fewer objects onto each cache page. Even though you're only writing one 32-bit value, you still need the page in CPU cache. With the smaller objects, you can get through more loops before you next need to read from main memory.

like image 187
Carson63000 Avatar answered Nov 13 '22 19:11

Carson63000


I have no documentation to prove it, but I suppose that it could be a matter of locality. Being the complex classes wider in terms of memory, it'd take longer for the kernel to access distant areas of memory, on the heap or on the stack. To be objective, though, I have to say that the difference between your measures sounds really high for the problem to be the system's fault.

About the difference between classes and structures, I cannot document this either, but it may be because, for the same principle as before, the stack is cached more often than heap regions, leading to less cache misses.

Have you ran the program with active optimizations?

EDIT: I've made a small test on ComplexStruct and used the StructLayoutAttribute with LayoutKind.Explicit as the parameter, then added a FieldOffsetAttribute with 0 as a parameter to each field of the structure. The times were cut significantly, and I think they were approximately the same as the SimpleStruct's ones. I ran it in Debug mode, debugger on, no optimizations. While the struct retained its fields, its size in memory was cut and so were the times.

like image 36
Mir Avatar answered Nov 13 '22 18:11

Mir