Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory Leak Detectors Working Principle

How do memory leak detectors actually work? What are the underlying concepts in general? Can take C++ as the language to explain this.

like image 475
amit1990 Avatar asked Feb 11 '15 04:02

amit1990


People also ask

How does a memory leak detector work?

The basic implementation is actually pretty simple. You just maintain a lookup table of every allocation and its line number, and remove the entry when it is freed. Then when the program is done you can list all leaked memory. The hard part is determining when and where the allocation should have been freed.

How do you detect a memory leak?

The best approach to checking for the existence of a memory leak in your application is by looking at your RAM usage and investigating the total amount of memory been used versus the total amount available. Evidently, it is advisable to obtain snapshots of your memory's heap dump while in a production environment.

What test is used to detect memory leaks?

Memory testing involves validating a C or C++ application's use of memory and looking for memory leaks or illegal management of memory, such as buffer overwrites. Memory leaks can be catastrophic to an application, resulting in hangs, buffering, or crashes. In the worst-case scenario, they are reported by a customer.


2 Answers

There are a couple of different ways that leak detectors work. You can replace the implementation of malloc and free with ones that can track more information during allocation and are not concerned with performance. This is similar to how dmalloc works. In general, any address that is malloc'ed but not free'd is leaked.

The basic implementation is actually pretty simple. You just maintain a lookup table of every allocation and its line number, and remove the entry when it is freed. Then when the program is done you can list all leaked memory. The hard part is determining when and where the allocation should have been freed. This is even harder when there are multiple pointers to the same address.

In practice, you'll probably want more than just the single line number, but rather a stack trace for the lost allocations.

Another approach is how valgrind works which implements an entire virtual machine to keep track of addresses and memory references and associated bookkeeping. The valgrind approach is much more expensive, but also much more effective as it can also tell you about other types of memory errors like out of bounds reads or writes.

Valgrind essentially instruments the underlying instructions and can track when a given memory address has no more references. It can do this by tracking assignments of addresses, and so it can tell you not just that a piece of memory was lost, but exactly when it became lost.

C++ makes things a little harder for both types of leak detectors because it adds the new and delete operators. Technically new can be a completely different source of memory than malloc. However, in practice many real C++ implementations just use malloc to implement new or have an option to use malloc instead of the alternate approach.

Also higher level languages like C++ tend to have alternative higher level ways of allocating memory like std::vector or std::list. A basic leak detector would report the potentially many allocations made by the higher level modes separately. That's much less useful than saying the entire container was lost.

like image 70
b4hand Avatar answered Oct 18 '22 18:10

b4hand


Here's a published technical paper on how our CheckPointer tool works.

Fundamentally it tracks the lifetimes of all values (heap and stack), and their sizes according their types as defined by the language. This allows CheckPointer to find not only leaks, but out-of-array bound accesses, even for arrays in the stack, which valgrind won't do.

In particular, it analyzes the source code to find all pointer uses. (This is quite the task just by itself).

It keeps track of pointer meta data for each pointer, consisting of

  • A reference to the object meta data for the heap-allocated object or global or local variable orfunction pointed to by the pointer and
  • The address range of the (sub)object of the object that the pointer may currently access. This may be smaller than the address range of the whole object; e.g. if you take the address of a struct member, the instrumented source code will only allow access to that member when using the resulting pointer.

It also tracks the kind and location of each object, i.e. whether it is a function, a global, thread-local or local variable, heap-allocated memory, or a string literal constant:

  • The address range of the object that may be safely accessed, and
  • For each pointer stored in the heap-allocated object or variable, a reference to the pointer metadata for that pointer.

All this tracking is accomplished by transforming the original program source, into a program which does what the original program does, and interleaves various meta-data checking or updating routines. The resulting program is compiled and run. Where a meta-data check fails at runtime, a backtrace is provided with a report of the type of failure (invalid pointer, pointer outside valid bounds, ...)

like image 19
Ira Baxter Avatar answered Oct 18 '22 18:10

Ira Baxter