Is there a way to get valgrind to use multiple processors?
I'm doing some bottleneck profiling with valgrind's callgrind and noticed significantly different resource usage behavior in my application vs when run outside of valgrind/callgrind.
When run outside valgrind, it maxes out several processors, but run inside valgrind only uses one. This makes me worry that my bottle necks will be in different places, and thus invalidate my profiling.
The main thing to point out with respect to threaded programs is that your program will use the native threading library, but Valgrind serialises execution so that only one (kernel) thread is running at a time.
Valgrind works by doing a just-in-time (JIT) translation of the input program into an equivalent version that has additional checking. For the memcheck tool, this means it literally looks at the x86 code in the executable, and detects what instructions represent memory accesses.
Valgrind doesn't actually execute your code natively - instead it runs it inside a simulator. That's why it's so slow. So, there's no way to make it run faster, and still get the benefit of Valgrind. Your best bet is to set the ulimit so that your program generates a core file when it crashes.
possibly lost: heap-allocated memory that was never freed to which valgrind cannot be sure whether there is a pointer or not. still reachable: heap-allocated memory that was never freed to which the program still has a pointer at exit (typically this means a global variable points to it).
Take a look at:
http://valgrind.org/docs/manual/manual-core.html#manual-core.pthreads_perf_sched
They added:
--fair-sched option
It may help.
According to the Valgrind Docs, they do not support multiple processors:
The main thing to point out with respect to threaded programs is that your program will use the native threading library, but Valgrind serialises execution so that only one (kernel) thread is running at a time. This approach avoids the horrible implementation problems of implementing a truly multithreaded version of Valgrind, but it does mean that threaded apps run only on one CPU, even if you have a multiprocessor or multicore machine.
Valgrind doesn't schedule the threads itself. It merely ensures that only one thread runs at once, using a simple locking scheme. The actual thread scheduling remains under control of the OS kernel. What this does mean, though, is that your program will see very different scheduling when run on Valgrind than it does when running normally. This is both because Valgrind is serialising the threads, and because the code runs so much slower than normal.
This difference in scheduling may cause your program to behave differently, if you have some kind of concurrency, critical race, locking, or similar, bugs. In that case you might consider using the tools Helgrind and/or DRD to track them down.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With