In a previous question, it was discovered that using recent versions of GNU libstdc++ to read a series of numbers from a space-separated human-readable file (mirror) causes a ton of allocations, scaling linearly with the size of the file.
Given the file linked above and this test program:
#include <fstream>
int main(int, char**) {
std::ifstream ww15mgh("ww15mgh.grd");
double value;
while (ww15mgh >> value);
return 0;
}
Valgrind --tool=memcheck
reports:
==523661== total heap usage: 1,038,970 allocs, 1,038,970 frees, 59,302,487
Because each of those million allocations is immediately freed before operator>>
returns, there are no leaks and the actual memory footprint of the program in a release build is tiny (81KB). But, compiling with -fsanitize=address
turns that mass of allocations into a Real Problem.
Here's the total memory footprint of the above program, running with and without ASAN:
$ g++ stackoverflow.cpp -o _build/stackoverflow
$ /usr/bin/time -v _build/stackoverflow |& grep 'm r'
Maximum resident set size (kbytes): 3512
$ g++ stackoverflow.cpp -o _build/stackoverflow_asan -fsanitize=address
$ /usr/bin/time -v _build/stackoverflow_asan |& grep 'm r'
Maximum resident set size (kbytes): 125196
125MB doesn't seem like a huge problem, but in the context of a much larger unit test program running on an embedded ARM board and calling this function several times, it ran my ARM CI environment out of memory.
A workaround for this specific case
#include <fstream>
#include <string>
#include <cstdio>
int main(int, char**) {
std::ifstream ww15mgh("ww15mgh.grd");
double value;
#if defined(__SANITIZE_ADDRESS__) && (defined(__GLIBCXX__) || defined(__GLIBCPP__))
std::string text;
while (ww15mgh >> text)
value = std::strtod(text.data(), nullptr);
#else
while (ww15mgh >> value);
#endif
return 0;
}
Using this preprocessor gate yields a much more manageable total memory footprint:
$ g++ stackoverflow_workaround.cpp -o _build/stackoverflow_workaround_asan -fsanitize=address
$ /usr/bin/time -v _build/stackoverflow_workaround_asan |& grep 'm r'
Maximum resident set size (kbytes): 6396
Which is because neither libstdc++'s operator>>(ifstream&, string&)
nor glibc's strtod
have superfluous allocations in them, as can be seen by tricking the workaround to run under valgrind:
$ g++ stackoverflow_workaround.cpp -D__SANITIZE_ADDRESS__
$ valgrind --tool=memcheck --leak-check=yes ./a.out |& grep 'total heap'
==2483624== total heap usage: 3 allocs, 3 frees, 81,368 bytes allocated
Sample Code and CI Pipeline Results for this are available on gitlab.
At this point my CI is no longer running out of memory and crashing, so my co-workers can carry on with their lives. I, however, feel like hiding stuff from the sanitizer with #ifdef __SANITIZE_ADDRESS__
is somehow cheating.
The Question
Is there a way to make the original program run under ASAN, but skip ASAN's allocator padding just for the duration of the operator>>
call? In the general case, of a tight loop calling a third-party function that allocates memory, how do I avoid an enormous memory footprint with -fsanitize=address
?
As you say, AddressSanitizer will delay the reuse of freed memory, to help catch use-after-free errors. This feature is called "quarantine", and the amount of memory used for it is configurable at runtime, see https://github.com/google/sanitizers/wiki/AddressSanitizerFlags. So for example, if you set the environment variable ASAN_OPTIONS
to quarantine_size_mb=4
before running your program, it should limit the amount of memory used to 4 megabytes.
This is not specific to the call in question, so it doesn't exactly address what you asked, but I think it will solve your underlying problem of "how to use AddressSanitizer on a machine with low memory".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With