Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-threaded performance and profiling

I have a program that scales badly to multiple threads, although – theoretically – it should scale linearly: it's a calculation that splits into smaller chunks and doesn't need system calls, library calls, locking, etc. Running with four threads is only about twice as fast as running with a single thread (on a quad core system), while I'd expect a number closer to four times as fast.

The run time of the implementations with pthreads, C++0x threads and OpenMP agree.

In order to pinpoint the cause, I tried gprof (useless) and valgrind (I didn't see anything obvious). How can I effectively benchmark what's causing the slowdown? Any generic ideas as to its possible causes?

— Update —

The calculation involves Monte Carlo integration and I noticed that an unreasonable amount of time is spent generating random numbers. While I don't know yet why this happens with four threads, I noticed that the random number generator is not reentrant. When using mutexes, the running time explodes. I'll reimplement this part before checking for other problems.

I did reimplement the sampling classes which did improve performance substantially. The remaining problem was, in fact, contention of the CPU caches (it was revealed by cachegrind as Evgeny suspected.)

like image 633
jk4736 Avatar asked Feb 12 '12 17:02

jk4736


People also ask

What is multi threaded performance?

Multithreading is the ability of a program or an operating system to enable more than one user at a time without requiring multiple copies of the program running on the computer. Multithreading can also handle multiple requests from the same user.

Does multithreading improve performance on single core?

Even on a single-core platform, multithreading can boost the performance of such applications because individual threads are able to perform IO (causing them to block), while others within the same process continue to run.

Do more threads always mean better performance?

Threading is the method of choice for extracting performance from multi-core chips. It might seem that if a little threading is good, then a lot must be better. In fact, having too many threads can bog down a program.


1 Answers

You can use oprofile. Or a poor man's pseudo-profiler: run the program under gdb, stop it and look where it is stopped. "valgrind --tool=cachegrind" will show you how efficiently CPU cache is used.

Monte Carlo integration seems to be very memory-intensive algorithm. Try to estimate, how memory bandwidth is used. It may be the limiting factor for your program's performance. Also if your system is only 2-core with hyperthreading, it should not work much faster with 4 threads, comparing with 2 threads.

like image 147
Evgeny Kluev Avatar answered Sep 19 '22 16:09

Evgeny Kluev