Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Threaded execution speed of LOCK CMPXCHG

I wrote a multi-threaded app to benchmark the speed of running LOCK CMPXCHG (x86 ASM).

On my machine (dual Core - Core 2), with 2 threads running and accessing the same variable, I can perform about 40M ops/second.

Then I gave each thread a unique variable to operate on. Obviously this means there's no locking contention between the threads, so I expected a speed performance. However, the speed didn't change. Why?

like image 282
IamIC Avatar asked Aug 06 '10 19:08

IamIC


1 Answers

If you have 2 threads simultaneously accessing data that's on the same cache line, you get false sharing, where each core has to keep updating its cache because the same part of the cache was changed by the other core.

Make sure that the unique variables are allocated in different blocks of memory (at least 128 bytes apart, say) to make sure that this isn't the issue you're having.

DDJ has a nice article describing the horrible effects of false sharing: http://www.drdobbs.com/go-parallel/article/showArticle.jhtml?articleID=217500206

Here's Wikipedia's entry on it: http://en.wikipedia.org/wiki/False_sharing

like image 137
Gabe Avatar answered Oct 16 '22 21:10

Gabe