Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I exclude allocations in a tight loop from ASAN?

Tags:

In a previous question, it was discovered that using recent versions of GNU libstdc++ to read a series of numbers from a space-separated human-readable file (mirror) causes a ton of allocations, scaling linearly with the size of the file.

Given the file linked above and this test program:

#include <fstream>

int main(int, char**) {
    std::ifstream ww15mgh("ww15mgh.grd");
    double value;
    while (ww15mgh >> value);
    return 0;
}

Valgrind --tool=memcheck reports:

==523661==   total heap usage: 1,038,970 allocs, 1,038,970 frees, 59,302,487 

Because each of those million allocations is immediately freed before operator>> returns, there are no leaks and the actual memory footprint of the program in a release build is tiny (81KB). But, compiling with -fsanitize=address turns that mass of allocations into a Real Problem.

Here's the total memory footprint of the above program, running with and without ASAN:

$ g++ stackoverflow.cpp -o _build/stackoverflow
$ /usr/bin/time -v _build/stackoverflow |& grep 'm r'
    Maximum resident set size (kbytes): 3512
$ g++ stackoverflow.cpp -o _build/stackoverflow_asan -fsanitize=address
$ /usr/bin/time -v _build/stackoverflow_asan |& grep 'm r'
    Maximum resident set size (kbytes): 125196

125MB doesn't seem like a huge problem, but in the context of a much larger unit test program running on an embedded ARM board and calling this function several times, it ran my ARM CI environment out of memory.

A workaround for this specific case

#include <fstream>
#include <string>
#include <cstdio>

int main(int, char**) {
    std::ifstream ww15mgh("ww15mgh.grd");
    double value;
#if defined(__SANITIZE_ADDRESS__) && (defined(__GLIBCXX__) || defined(__GLIBCPP__))
    std::string text;
    while (ww15mgh >> text)
        value = std::strtod(text.data(), nullptr);
#else
    while (ww15mgh >> value);
#endif
    return 0;
}

Using this preprocessor gate yields a much more manageable total memory footprint:

$ g++ stackoverflow_workaround.cpp -o _build/stackoverflow_workaround_asan -fsanitize=address
$ /usr/bin/time -v _build/stackoverflow_workaround_asan |& grep 'm r'
    Maximum resident set size (kbytes): 6396

Which is because neither libstdc++'s operator>>(ifstream&, string&) nor glibc's strtod have superfluous allocations in them, as can be seen by tricking the workaround to run under valgrind:

$ g++ stackoverflow_workaround.cpp -D__SANITIZE_ADDRESS__
$ valgrind --tool=memcheck --leak-check=yes ./a.out |& grep 'total heap'
==2483624==   total heap usage: 3 allocs, 3 frees, 81,368 bytes allocated

Sample Code and CI Pipeline Results for this are available on gitlab.

At this point my CI is no longer running out of memory and crashing, so my co-workers can carry on with their lives. I, however, feel like hiding stuff from the sanitizer with #ifdef __SANITIZE_ADDRESS__ is somehow cheating.

The Question

Is there a way to make the original program run under ASAN, but skip ASAN's allocator padding just for the duration of the operator>> call? In the general case, of a tight loop calling a third-party function that allocates memory, how do I avoid an enormous memory footprint with -fsanitize=address?

like image 367
Dan Avatar asked Jan 14 '21 14:01

Dan


1 Answers

As you say, AddressSanitizer will delay the reuse of freed memory, to help catch use-after-free errors. This feature is called "quarantine", and the amount of memory used for it is configurable at runtime, see https://github.com/google/sanitizers/wiki/AddressSanitizerFlags. So for example, if you set the environment variable ASAN_OPTIONS to quarantine_size_mb=4 before running your program, it should limit the amount of memory used to 4 megabytes.

This is not specific to the call in question, so it doesn't exactly address what you asked, but I think it will solve your underlying problem of "how to use AddressSanitizer on a machine with low memory".

like image 159
Nate Eldredge Avatar answered Oct 02 '22 12:10

Nate Eldredge