Why is Having More Threads than Cores Faster?

Tags:

I've implemented a version of PageRank in a multithreaded version. I'm running it on a 4-core Q6600. When I run it set to create 4 threads, I get:

real    6.968s
user   26.020s
sys     0.050s

When I run with 128 threads I get:

real    0.545s
user    1.330s
sys     0.040s

This makes no sense to me. The basic algorithm is a sum-reduce:

All threads sum a subset of the input;
Synchronize;
Each thread then accumulates part of the results from the other threads;
The main thread sums an intermediate value from all the threads and then determines whether to continue.

Profiling hasn't helped. I'm not sure what data would be helpful to understand my code - please just ask.

It really has me puzzled.

631

asked May 13 '11 04:05

laurencer

1 Answers

Deliberately creating more threads than processors is a standard technique used to make use of "spare cycles" where a thread is blocked waiting for something, whether that's I/O, a mutex, or something else by providing some other useful work for the processor to do.

If your threads are doing I/O then this is a strong contender for the speed-up: as each thread blocks waiting for the I/O, the processor can run the other threads until they too block for I/O, hopefully by which time the data for the first thread is ready, and so forth.

Another possible cause of the speed up is that your threads are experiencing false sharing. If you have two threads writing data to different values on the same cache line (e.g. adjacent elements of an array) then this will block the CPU whilst the cache line is transferred back and forth. By adding more threads you decrease the likelihood that they are operating on adjacent elements, and thus reduce the chance of false sharing. You can easily test this by adding extra padding to your data elements so they are each at least 64 bytes in size (the typical cache line size). If your 4-thread code speeds up, this was the problem.

113

answered Sep 28 '22 06:09

Anthony Williams

Related questions
                            
                                Is ThreadLocal thread safe?
                            
                                JMeter: What is the purpose of tearDown Thread Group
                            
                                Using threads and processes together with shared queues in Python
                            
                                Java locking structure best pattern
                            
                                iOS network requests in a serial queue
                            
                                Android AudioRecord won't initialize
                            
                                What happens when different CPU cores write to the same RAM address without synchronization?
                            
                                Vector of elements containing std::threads
                            
                                How to restart thread in java? [duplicate]
                            
                                Parallel coding Vs Multithreading (on single cpu)
                            
                                Python Interpreter blocks Multithreaded DNS requests?
                            
                                Common multithreading mistakes beginners make on iPhone
                            
                                SQL last insert in Drupal. Is it really threadsafe?
                            
                                Why does WaitForSingleObject(INVALID_HANDLE_VALUE, INFINITE) block?
                            
                                How to unit test that ExecutorService spawns new thread for task?
                            
                                Is it possible to have more than 32 locks in ConcurrentHashMap
                            
                                Sharing a db connection between threads in a C# application?
                            
                                Multithreading in PHP
                            
                                ajax multi-threaded
                            
                                Best sort for multi threaded application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is Having More Threads than Cores Faster?

Tags:

performance

multithreading

pthreads

laurencer

People also ask

1 Answers

Anthony Williams

Recent Activity

Donate For Us