The output of a typical profiler is, a list of functions in your code, sorted by the amount of time each function took while the program ran.
This is very good, but sometimes I'm interested more with what was program doing most of the time, than with where was EIP
most of the time.
An example output of my hypothetical profiler is:
Waiting for file IO - 19% of execution time.
Waiting for network - 4% of execution time
Cache misses - 70% of execution time.
Actual computation - 7% of execution time.
Is there such a profiler? Is it possible to derive such an output from a "standard" profiler?
I'm using Linux, but I'll be glad to hear any solutions for other systems.
Program profiling is an advanced optimization technique to reorder procedures, or code within procedures, in ILE programs and service programs based on statistical data gathered while running the program.
You can use profiling tools to identify which portions of the program are executed most frequently or where most of the time is spent. Profiling tools are typically used after a basic tool, such as the vmstat or iostat commands, shows that a CPU bottleneck is causing a performance problem.
Performance profilers are software development tools designed to help you analyze the performance of your applications and improve poorly performing sections of code.
Application profiling requires accurate knowledge of an application's transactional configuration and the interaction of the application with its persistent state during the course of each transaction. You can execute the analysis in either closed world or open world mode.
This is Solaris only, but dtrace can monitor almost every kind of I/O, on/off CPU, time in specific functions, sleep time, etc. I'm not sure if it can determine cache misses though, assuming you mean CPU cache - I'm not sure if that information is made available by the CPU or not.
Please take a look at this and this.
Consider any thread. At any instant of time it is doing something, and it is doing it for a reason, and slowness can be defined as the time it spends for poor reasons - it doesn't need to be spending that time.
Take a snapshot of the thread at a point in time. Maybe it's in a cache miss, in an instruction, in a statement, in a function, called from a call instruction in another function, called from another, and so on, up to call _main
. Every one of those steps has a reason, that an examination of the code reveals.
Maybe at that time the disk is coming around to certain sector, so some data streaming can be started, so a buffer can be filled, so a read statement can be satisfied, in a function, and that function is called from a call site in another function, and that from another, and so on, up to call _main
, or whatever happens to be the top of the thread.
So, the way to find bottlenecks is to find when the code is spending time for poor reasons, and the best way to find that is to take snapshots of its state. The EIP, or any other tiny piece of the state, is not going to do it, because it won't tell you why.
Very few profilers "get it". The ones that do are the wall-clock-time stack-samplers that report by line of code (not by function) percent of time active (not amount of time, especially not "self" or "exclusive" time.) One that does is Zoom, and there are others.
Looking at where the EIP hangs out is like trying to tell time on a clock with only a second hand. Measuring functions is like trying to tell time on a clock with some of the digits missing. Profiling only during CPU time, not during blocked time, is like trying to tell time on a clock that randomly stops running for long stretches. Being concerned about measurement precision is like trying to time your lunch hour to the second.
This is not a mysterious subject.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With