How do memory leak detectors actually work? What are the underlying concepts in general? Can take C++ as the language to explain this.
The basic implementation is actually pretty simple. You just maintain a lookup table of every allocation and its line number, and remove the entry when it is freed. Then when the program is done you can list all leaked memory. The hard part is determining when and where the allocation should have been freed.
The best approach to checking for the existence of a memory leak in your application is by looking at your RAM usage and investigating the total amount of memory been used versus the total amount available. Evidently, it is advisable to obtain snapshots of your memory's heap dump while in a production environment.
Memory testing involves validating a C or C++ application's use of memory and looking for memory leaks or illegal management of memory, such as buffer overwrites. Memory leaks can be catastrophic to an application, resulting in hangs, buffering, or crashes. In the worst-case scenario, they are reported by a customer.
There are a couple of different ways that leak detectors work. You can replace the implementation of malloc
and free
with ones that can track more information during allocation and are not concerned with performance. This is similar to how dmalloc
works. In general, any address that is malloc
'ed but not free
'd is leaked.
The basic implementation is actually pretty simple. You just maintain a lookup table of every allocation and its line number, and remove the entry when it is freed. Then when the program is done you can list all leaked memory. The hard part is determining when and where the allocation should have been freed. This is even harder when there are multiple pointers to the same address.
In practice, you'll probably want more than just the single line number, but rather a stack trace for the lost allocations.
Another approach is how valgrind works which implements an entire virtual machine to keep track of addresses and memory references and associated bookkeeping. The valgrind approach is much more expensive, but also much more effective as it can also tell you about other types of memory errors like out of bounds reads or writes.
Valgrind essentially instruments the underlying instructions and can track when a given memory address has no more references. It can do this by tracking assignments of addresses, and so it can tell you not just that a piece of memory was lost, but exactly when it became lost.
C++ makes things a little harder for both types of leak detectors because it adds the new
and delete
operators. Technically new
can be a completely different source of memory than malloc
. However, in practice many real C++ implementations just use malloc
to implement new
or have an option to use malloc
instead of the alternate approach.
Also higher level languages like C++ tend to have alternative higher level ways of allocating memory like std::vector
or std::list
. A basic leak detector would report the potentially many allocations made by the higher level modes separately. That's much less useful than saying the entire container was lost.
Here's a published technical paper on how our CheckPointer tool works.
Fundamentally it tracks the lifetimes of all values (heap and stack), and their sizes according their types as defined by the language. This allows CheckPointer to find not only leaks, but out-of-array bound accesses, even for arrays in the stack, which valgrind won't do.
In particular, it analyzes the source code to find all pointer uses. (This is quite the task just by itself).
It keeps track of pointer meta data for each pointer, consisting of
It also tracks the kind and location of each object, i.e. whether it is a function, a global, thread-local or local variable, heap-allocated memory, or a string literal constant:
All this tracking is accomplished by transforming the original program source, into a program which does what the original program does, and interleaves various meta-data checking or updating routines. The resulting program is compiled and run. Where a meta-data check fails at runtime, a backtrace is provided with a report of the type of failure (invalid pointer, pointer outside valid bounds, ...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With