I wrote a multi-threaded app to benchmark the speed of running LOCK CMPXCHG (x86 ASM).
On my machine (dual Core - Core 2), with 2 threads running and accessing the same variable, I can perform about 40M ops/second.
Then I gave each thread a unique variable to operate on. Obviously this means there's no locking contention between the threads, so I expected a speed performance. However, the speed didn't change. Why?
If you have 2 threads simultaneously accessing data that's on the same cache line, you get false sharing, where each core has to keep updating its cache because the same part of the cache was changed by the other core.
Make sure that the unique variables are allocated in different blocks of memory (at least 128 bytes apart, say) to make sure that this isn't the issue you're having.
DDJ has a nice article describing the horrible effects of false sharing: http://www.drdobbs.com/go-parallel/article/showArticle.jhtml?articleID=217500206
Here's Wikipedia's entry on it: http://en.wikipedia.org/wiki/False_sharing
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With