Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ function instrumentation via clang++'s -finstrument-functions : how to ignore internal std library calls?

Let's say I have a function like:

template<typename It, typename Cmp>
void mysort( It begin, It end, Cmp cmp )
{
    std::sort( begin, end, cmp );
}

When I compile this using -finstrument-functions-after-inlining with clang++ --version:

clang version 11.0.0 (...)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: ...

The instrument code explodes the execution time, because my entry and exit functions are called for every call of

void std::__introsort_loop<...>(...)
void std::__move_median_to_first<...>(...)

I'm sorting a really big array, so my program doesn't finish: without instrumentation it takes around 10 seconds, with instrumentation I've cancelled it at 10 minutes.

I've tried adding __attribute__((no_instrument_function)) to mysort (and the function that calls mysort), but this doesn't seem to have an effect as far as these standard library calls are concerned.

Does anyone know if it is possible to ignore function instrumentation for the internals of a standard library function like std::sort? Ideally, I would only have mysort instrumented, so a single entry and a single exit!

I see that clang++ sadly does not yet support anything like finstrument-functions-exclude-function-list or finstrument-functions-exclude-file-list, but g++ does not yet support -finstrument-functions-after-inlining which I would ideally have, so I'm stuck!

EDIT: After playing more, it would appear the effect on execution-time is actually less than that described, so this isn't the end of the world. The problem still remains however, because most people who are doing function instrumentation in clang will only care about the application code, and not those functions linked from (for example) the standard library.

EDIT2: To further highlight the problem now that I've got it running in a reasonable time frame: the resulting trace that I produce from the instrumented code with those two standard library functions is 15GB. When I hard code my tracing to ignore the two function addresses, the resulting trace is 3.7MB!

like image 836
ricky116 Avatar asked Jul 20 '20 16:07

ricky116


1 Answers

I've run into the same problem. It looks like support for these flags was once proposed, but never merged into the main branch.

https://reviews.llvm.org/D37622

This is not a direct answer, since the tool doesn't support what you want to do, but I think I have a decent work-around. What I wound up doing was creating a "skip list" of sorts. In the instrumented functions (__cyg_profile_func_enter and __cyg_profile_func_exit), I would guess the part that is contributing most to your execution time is the printing. If you can come up with a way of short-circuiting the profile functions, that should help, even if it's not the most ideal. At the very least it will limit the size of the output file.

Something like

#include <stdint.h>

uintptr_t skipAddrs[] = {
    // assuming 64-bit addresses
    0x123456789abcdef, 0x2468ace2468ace24
};
size_t arrSize = 0;

int main(void)
{   
    ...

    arrSize = sizeof(skipAddrs)/sizeof(skipAddrs[0]);
    // https://stackoverflow.com/a/37539/12940429

    ...
}

void __cyg_profile_func_enter (void *this_fn, void *call_site) {
    for (size_t idx = 0; idx < arrSize; idx++) {
        if ((uintptr_t) this_fn == skipAddrs[idx]) {
            return;
        }
    }
}

I use something like objdump -t binaryFile to examine the symbol table and find what the addresses are for each function.

If you specifically want to ignore library calls, something that might work is examining the symbol table of your object file(s) before linking against libraries, then ignoring all the ones that appear new in the final binary.

All this should be possible with things like grep, awk, or python.

like image 88
thatjames Avatar answered Oct 06 '22 00:10

thatjames