We have a complex program that is working well on heavy duty input (any input actually) with no multithreading implemented.
We've implemented multithreading with a threadpool, and given these input parameters I get these results:
(Note: Where I say no errors, it means I've tested with valgrind -v
and when I say no memory leaks, it means I've tested it with valgrind --leak-check=full -v
).
valgrind -v
with > 1 workers the program completes successfully. Also, no errors are printed from valgrind, that is ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
. Now that I don't get any errors from valgrind to start with, what can I do to find the memory corruption problem in this complex and big application?
DevelopmentEnvironment:
Ubuntu, 64bit, gcc version: 4.7.2 and 4.8.1 (different computers, newer version of Ubuntu).
Valgrind Memcheck is a tool that detects memory leaks and memory errors. Some of the most difficult C bugs come from mismanagement of memory: allocating the wrong size, using an uninitialized pointer, accessing memory after it was freed, overrunning a buffer, and so on.
This error is caused if you forget to initialize variables before using or accessing them. You can usually re-run valgrind with the flag --track-origins=yes to see where the uninitialized value came from.
If you compile your program with the -g flag, Valgrind will show you the function names and line numbers where errors occur. Sometimes the actual bug occurs on a different line (particularly for uninitialized value errors) but the line number Valgrind tells you is a good starting point.
Now that I don't get any errors from valgrind to start with, what can I do to find the memory corruption problem in this complex and big application?
Well let me describe to you what I did to find memory leaks in Microsoft's implementation of JavaScript back in the 1990s.
First I ensured that in the debug version of my program, as many memory allocations as possible were being routed to the same helper methods. That is, I redefined malloc
, new
, etc, to all be synonyms for an allocator that I wrote myself.
That allocator was just a thin shell around an operating system virtual heap memory allocator, but it had some extra smarts. It allocated extra memory at the beginning and end of the block and filled that with sentinel values, a threadsafe count of the number of allocations so far, and a threadsafe doubly-linked list of all allocations. The "free" routine would verify that the sentinel values on both sides were still intact; if not, then there's a memory corruption somewhere. It would unlink the block from the linked list and free it.
At any point I could ask the memory manager for a list of all the outstanding blocks in memory in the order they had been allocated. Any items left in the list when the DLL was unloaded were memory leaks.
Those tools enabled me to find memory leaks and memory corruptions in real time very easily.
With > 1 workers, I get: a. usually heap-corruption error, b.double-free. When running with valgrind -v with > 1 workers the program completes successfully
Based on the above symptoms, it looks to me that there is clearly some sort of synchronization problem is happening in your program. It looks like your program is sharing the heap memory address between the threads and hence whenever there is some data race you are facing problem.
You have also mentioned that when you are running valgrind -v, then your program is completing successfully. This indicates that your program has synchronization problem and that too is dependant on the sequence/timing. These are one of the most difficult bug to find out.We should also remember that dynamic tools would not give any warning until program goes and execute something wrong. I mean there could be problem in the program, but sequence of execution(as there is some timing related problem) determined whether tools would capture those failure or not.
Having said that, I think there is not sort cut way to find such bugs in big programs.However I strongly suspect that there is some data racing scenario which is leading to memory corruption/double free. So you may want to use Helgrind to check/find data racing/threading problem which might be leading to memory corruption.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With