Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Having hard time tracking memory corruption - when running with Valgrind runs correctly with no errors

We have a complex program that is working well on heavy duty input (any input actually) with no multithreading implemented.
We've implemented multithreading with a threadpool, and given these input parameters I get these results:
(Note: Where I say no errors, it means I've tested with valgrind -v and when I say no memory leaks, it means I've tested it with valgrind --leak-check=full -v).

  1. small_file: Runs successfully with more than 1 workers (threads), no valgrind errors, no memory leaks
  2. medium_file: With 1 worker it runs successfully, no errors/memory leaks. With > 1 workers, I get: a. usually heap-corruption error, b. double-free. When running with valgrind -v with > 1 workers the program completes successfully. Also, no errors are printed from valgrind, that is ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2).

Now that I don't get any errors from valgrind to start with, what can I do to find the memory corruption problem in this complex and big application?

DevelopmentEnvironment:
Ubuntu, 64bit, gcc version: 4.7.2 and 4.8.1 (different computers, newer version of Ubuntu).

like image 485
Chris Avatar asked Mar 24 '14 18:03

Chris


People also ask

Can valgrind detect memory corruption?

Valgrind Memcheck is a tool that detects memory leaks and memory errors. Some of the most difficult C bugs come from mismanagement of memory: allocating the wrong size, using an uninitialized pointer, accessing memory after it was freed, overrunning a buffer, and so on.

How do I fix valgrind error?

This error is caused if you forget to initialize variables before using or accessing them. You can usually re-run valgrind with the flag --track-origins=yes to see where the uninitialized value came from.

How do I track valgrind errors?

If you compile your program with the -g flag, Valgrind will show you the function names and line numbers where errors occur. Sometimes the actual bug occurs on a different line (particularly for uninitialized value errors) but the line number Valgrind tells you is a good starting point.


2 Answers

Now that I don't get any errors from valgrind to start with, what can I do to find the memory corruption problem in this complex and big application?

Well let me describe to you what I did to find memory leaks in Microsoft's implementation of JavaScript back in the 1990s.

First I ensured that in the debug version of my program, as many memory allocations as possible were being routed to the same helper methods. That is, I redefined malloc, new, etc, to all be synonyms for an allocator that I wrote myself.

That allocator was just a thin shell around an operating system virtual heap memory allocator, but it had some extra smarts. It allocated extra memory at the beginning and end of the block and filled that with sentinel values, a threadsafe count of the number of allocations so far, and a threadsafe doubly-linked list of all allocations. The "free" routine would verify that the sentinel values on both sides were still intact; if not, then there's a memory corruption somewhere. It would unlink the block from the linked list and free it.

At any point I could ask the memory manager for a list of all the outstanding blocks in memory in the order they had been allocated. Any items left in the list when the DLL was unloaded were memory leaks.

Those tools enabled me to find memory leaks and memory corruptions in real time very easily.

like image 195
Eric Lippert Avatar answered Oct 20 '22 15:10

Eric Lippert


With > 1 workers, I get: a. usually heap-corruption error, b.double-free. When running with valgrind -v with > 1 workers the program completes successfully

Based on the above symptoms, it looks to me that there is clearly some sort of synchronization problem is happening in your program. It looks like your program is sharing the heap memory address between the threads and hence whenever there is some data race you are facing problem.

You have also mentioned that when you are running valgrind -v, then your program is completing successfully. This indicates that your program has synchronization problem and that too is dependant on the sequence/timing. These are one of the most difficult bug to find out.We should also remember that dynamic tools would not give any warning until program goes and execute something wrong. I mean there could be problem in the program, but sequence of execution(as there is some timing related problem) determined whether tools would capture those failure or not.

Having said that, I think there is not sort cut way to find such bugs in big programs.However I strongly suspect that there is some data racing scenario which is leading to memory corruption/double free. So you may want to use Helgrind to check/find data racing/threading problem which might be leading to memory corruption.

like image 4
Mantosh Kumar Avatar answered Oct 20 '22 16:10

Mantosh Kumar