Got some simple code
Int32[] tmpInt = new Int32[32];
long lStart = DateTime.Now.Ticks;
Thread t1 = new Thread(new ThreadStart(delegate()
{
for (Int32 i = 0; i < 100000000; i++)
Interlocked.Increment(ref tmpInt[5]);
}));
Thread t2 = new Thread(new ThreadStart(delegate()
{
for (Int32 i = 0; i < 100000000; i++)
Interlocked.Increment(ref tmpInt[20]);
}));
t1.Start();
t2.Start();
t1.Join();
t2.Join();
Console.WriteLine(((DateTime.Now.Ticks - lStart)/10000).ToString());
This takes ~3 seconds on my core 2 duo. If I change the index in t1 to tmpInt[4], it takes ~5.5 seconds.
Anyway, the first cache line ends at index 4. Being that a cache line is 64bytes and 5 int32s are only 20 bytes, that means there are 44 bytes of metadata and/or padding before the actual array.
Another set of values that I tested where 5 and 21. 5 and 21 take ~3 seconds, but 5 and 20 takes ~5.5 seconds, but that's because index 20 shares the same cache line as index 5 as they're spaced within the same 64 bytes.
So my question is, how much data does .Net reserve before an array and does this amount change between 32bit and 64bit systems?
Thanks :-)
In addition to the answer here: https://stackoverflow.com/a/1589806/543814
My tests indicated what I expected, on 32-bit [64-bit]:
In conclusion, there are 4 possibilities:
12 bytes (32-bit value array)
16 bytes (32-bit reference array)
20 bytes (64-bit value array)
28 bytes (64-bit reference array)
Something that I missed in the past: on a 64-bit machine with the project setting 'prefer 32-bit' enabled (default), 32-bit applies!
When the CPU attempts to load your array and suffers a cache miss it fetches the block of memory containing your array but not necessarily STARTING with it. .NET makes no guarantees that your array will be cache aligned.
To answer your question, the 44 bytes of padding is mostly other data from the associated page that happened to be in the same cache line.
edit: http://msdn.microsoft.com/en-us/magazine/cc163791.aspx Seems to indicate that an array has 16 bytes of additional storage. 4 bytes are the sync block index, 4 bytes are used for the typehandle metadata, and the rest is the object itself.
As a side comment, it's hard to exactly say that false-sharing is responsible for your delay here. It's likely given the timings but you should use a good profiler to examine the cache-miss rate. If it jumps high for your given case you can be pretty sure you're seeing false-sharing in play.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With