Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a profiler?

Tags:

java

profiler

i would to know how to write a profiler? What books and / or articles recommended? Can anyone help me please?

Someone has already done something like this?

like image 719
Agusti-N Avatar asked Dec 15 '08 16:12

Agusti-N


People also ask

How do code profilers work?

They work by interrupting the application under test periodically in proportion to the consumption of the resource we're interested in. While the program is interrupted the profiler grabs a snapshot of its current state, which includes where in the code it is. After the state is captured the program continues.

What is sample profiling?

The Sampling profiler helps you make a quick overall look at the application's performance. It polls the profiled application at certain intervals and determines the routine that is currently being executed. It increases the sample count for that routine and reports the number of collected samples in results.

What is a profiler used for?

A criminal profiler is part of the investigative team and uses specialized techniques and training to identify suspects. Criminal profilers, also known as criminal investigative analysts, compile and compare data from similar crimes and offenders to create a profile of a suspect.


4 Answers

Encouraging lot, aren't we :)

Profilers aren't too hard if you're just trying to get a reasonable idea of where the program's spending most of its time. If you're bothered about high accuracy and minimum disruption, things get difficult.

So if yoyu just want the answers a profiler would give you, go for one someone else has written. If you're looking for the intellectual challenge, why not have a go at writing one?

I've written a couple, for run time environments that the years have rendered irrelevant.

There are two approaches

  • adding something to each function or other significant point that logs the time and where it is.

  • having a timer going off regularly and taking a peek where the program currently is.

The JVMPI version seems to be the first kind - the link provided by uzhin shows that it can report on quite a number of things (see section 1.3). What gets executed changes to do this, so the profiling can affect the performance (and if you're profiling what was otherwise a very lightweight but often called function, it can mislead).

If you can get a timer/interrupt telling you where the program counter was at the time of the interrupt, you can use the symbol table/debugging information to work out which function it was in at the time. This provides less information but can be less disruptive. A bit more information can be obtained from walking the call stack to identify callers etc. I've no idea if these is even possible in Java...

Paul.

like image 172
The Archetypal Paul Avatar answered Sep 28 '22 12:09

The Archetypal Paul


I would look at those open-source projects first:

  • Eclipse TPTP (http://www.eclipse.org/tptp/)
  • VisualVM (https://visualvm.dev.java.net/)

Then I would look at JVMTI (not JVMPI)

  • http://java.sun.com/developer/technicalArticles/Programming/jvmti/
like image 21
Boune Avatar answered Sep 28 '22 12:09

Boune


I wrote one once, mainly as an attempt to make "deep sampling" more user-friendly. When you do the method manually, it is explained here. It is based on sampling, but rather than take a large number of small samples, you take a small number of large samples.

It can tell you, for example, that instruction I (usually a function call) is costing you some percent X of total execution time, more or less, since it appears on the stack on X% of samples.

Think about it, because this is a key point. The call stack exists as long as the program is running. If a particular call instruction I is on the stack X% of the time, then if that instruction could disappear, that X% of time would disappear. This does not depend on how many times I is executed, or how long the function call takes. So timers and counters are missing the point. And in a sense all instructions are call instructions, even if they only call microcode.

The sampler is based on the premise that it is better to know the address of instruction I with precision (because that is what you are looking for) than to know the number X% with precision. If you know that you could save roughly 30% of time by recoding something, do you really care that you might be off by 5%? You're still going to want to fix it. The amount of time it actually saves won't be made any less or greater by your knowing X precisely.

So it is possible to drive samples off of a timer, but frankly I found it just as useful to trigger an interrupt by the user pressing both shift keys as the same time. Since 20 samples is generally plenty, and this way you can be sure to take samples at a relevant time (i.e. not while waiting for user input) it was quite adequate. Another way would be to only do the timer-driven samples while the user holds down both shift keys (or something like that).

It did not concern me that the taking of samples might slow down the program, because the goal was not to measure speed, but to locate the most costly instructions. After fixing something, the overall speedup is easy to measure.

The main thing that the profiler provided was a UI so you could examine the results painlessly. What comes out of the sampling phase is a collection of call stack samples, where each sample is a list of addresses of instructions, where every instruction but the last is a call instruction. The UI was mainly what is called a "butterfly view". It has a current "focus", which is a particular instruction. To the left is displayed the call instructions immediately above that instruction, as culled from the stack samples. If the focus instruction is a call instruction, then the instructions below it appear to the right, as culled from the samples. On the focus instruction is displayed a percent, which is the percent of stacks containing that instruction. Similarly for each instruction on the left or right, the percent is broken down by the frequency of each such instruction. Of course, the instruction was represented by file, line number, and the name of the function it was in. The user could easily explore the data by clicking any of the instructions to make it the new focus.

A variation on this UI treated the butterfly as bipartite, consisting of alternating layers of function call instructions and the functions containing them. That can give a little more clarity of time spent in each function.

Maybe it's not obvious, so it's worth mentioning some properties of this technique.

  • Recursion is not an issue, because if an instruction appears more than once on any given stack sample, that still counts as only one sample containing it. It still remains true that the estimated time that would be saved by its removal is the percent of stacks it is on.

  • Notice this is not the same as a call tree. It gives you the cost of an instruction no matter how many different branches of a call tree it is in.

  • Performance of the UI is not an issue, because the number of samples need not be very large. If a particular instruction I is the focus, it is quite simple to find how may samples contain it, and for each adjacent instruction, how many of the samples containing I also contain the adjacent instruction next to it.

  • As mentioned before, speed of sampling is not an issue, because we're not measuring performance, we're diagnosing. The sampling does not bias the results, because the sampling does not affect what the overall program does. An algorithm that takes N instructions to complete still takes N instructions even if it is halted any number of times.

  • I'm often asked how to sample a program that completes in milliseconds. The simple answer is wrap it in an outer loop to make it take long enough to sample. You can find out what takes X% of time, remove it, get the X% speedup, and then remove the outer loop.

This little profiler, that I called YAPA (yet another performance analyzer) was DOS-based and made a nice little demo, but when I had serious work to do, I would fall back on the manual method. The main reason for this is that the call stack alone is often not enough state information to tell you why a particular cycle is being spent. You may also need to know other state information so you have a more complete idea of what the program was doing at that time. Since I found the manual method pretty satisfactory, I shelved the tool.

A point that's often missed when talking about profiling is that you can do it repeatedly to find multiple problems. For example, suppose instruction I1 is on the stack 5% of the time, and I2 is on the stack 50% of the time. Twenty samples will easily find I2, but maybe not I1. So you fix I2. Then you do it all again, but now I1 takes 10% of the time, so 20 samples will probably see it. This magnification effect allows repeated applications of profiling to achieve large compounded speedup factors.

like image 23
user738647 Avatar answered Sep 28 '22 11:09

user738647


JVMPI spec: http://java.sun.com/j2se/1.5.0/docs/guide/jvmpi/jvmpi.html

I salute your courage and bravery

EDIT: And as noted by user Boune, JVMTI: http://java.sun.com/developer/technicalArticles/Programming/jvmti/

like image 39
Yoni Roit Avatar answered Sep 28 '22 11:09

Yoni Roit