I am using google's perftools (http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html) for CPU profiling---it's a wonderful tool that has helped me perform a great deal of CPU-time improvements on my application.
Unfortunately, I have gotten to the point that the code is still a bit slow, and when compiled using g++'s -O3 optimization level, all I know is that a specific function is slow, but not which aspects of it are slow.
If I remove the -O3 flag, then the unoptimized portions of the program overtake this function, and I don't get a lot of clarity into the actual parts of the function that are slow. If I leave the -O3 flag in, then the slow parts of the function are inlined, and I can't determine which parts of the function are slow.
Any suggestions? Thanks for your help!
For something like this, I've always used the "old school" way of doing it:
Insert into the routine you want to measure at various points statements which measure the current time (or cputime). Then simply print out or log the differences between them and you will know how long each section of code took. From there you can find out what is eating most of the time, and go in and get fine-grained timing within that section until you know what the problem is, and how to fix it.
If the overhead of the function calls is not the problem, you can also force inlining to be off with -fno-inline-small-functions -fno-inline-functions -fno-inline-functions-called-once -fno-inline
(I'm not exactly sure how these switches interact with each other, but I think they are independent). Then you can use your ordinary profiler to look at the call graph profile and see what function calls are taking what amount of time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With