Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Profiling, Performance Tuning and Memory Profiling exercises

I am about to conduct a workshop profiling, performance tuning, memory profiling, memory leak detection etc. of java applications using JProfiler and Eclipse Tptp. I need a set of exercises that I could offer to participants where they can: Use the tool to to profile the discover the problem: bottleneck, memory leak, suboptimal code etc. I am sure there is plenty experience and real-life examples around.

  • Resolve the problem and implement optimized code
  • Demonstrate the solution by performing another session of profiling
  • Ideally, write the unit test that demonstrates the performance gain

Problems nor solutions should not be overly complicated; it should be possible to resolve them in matter of minutes at best and matter of hours at worst. Some interesting areas to exercise:

  • Resolve memory leaks
  • Optimize loops
  • Optimize object creation and management
  • Optimize string operations
  • Resolve problems exacerbated by concurrency and concurrency bottlenecks

Ideally, exercises should include sample unoptimized code and the solution code.

like image 824
Dan Avatar asked Aug 04 '10 14:08

Dan


2 Answers

I try to find real life examples that I've seen in the wild (maybe slightly altered, but the basic problems were all very real). I've also tried to cluster them around the same scenario, so you can build up a session easily.

Scenario: you have a time consuming function that you want to do many times for different values, but the same values may pop up again (ideally not too long after it was created). A good and simple example is url-web page pairs that you need to download and process (for the exercise it should be probably simulated).

Loops:

  • You want to check if any of a set of words pops up in the pages. Use your function in a loop, but with the same value, pseudo code:

    for (word : words) {
        checkWord(download(url))
    }
    

    One solution is quite easy, just download the page before the loop. Other solution is below.

Memory leak:

  • simple one: you can also solve your problem with a kind of cache. In the simplest case you can just put the results to a (static) map. But if you don't prevent it, its size will grow infinitely -> memory leak.
    Possible solution: use an LRU map. Most likely performance will not degrade too much, but the memory leak should go away.
  • trickier one: say you implement the previous cache using a WeakHashMap, where the keys are the URLs (NOT as strings, see later), values are instances of a class that contain the URL, the downloaded page and something else. You may assume that it should be fine, but in fact it is not: as the value (which is not weakly referenced) has a reference to the key (the URL) the key will never be eligible to clean up -> nice memory leak.
    Solution: remove the URL from the value.
  • Same as before, but the urls are interned strings ("to save some memory if we happen to have the same strings again"), value does not refer to this. I did not try it, but it seems to me that it would also cause a leak, because interned Strings can not be GC-ed.
    Solution: do not intern, which will also lead to the advice that you must not skip: don't do premature optimization, as it is the root of all evil.

Object creation & Strings:

  • say you want to display the text of the pages only (~remove html tags). Write a function that does it line by line, and appends it to a growing result. At first the result should be a string, so appending will take a lot of time and object allocation. You can detect this problem from performance point of view (why appends are so slow) and from object creation point of view (why we created so many Strings, StringBuffers, arrays, etc).
    Solution: use a StringBuilder for the result.

Concurrency:

  • You want to speed the whole stuff up by doing downloading/filtering in parallel. Create some threads and run your code using them, but do everything inside a big synchronized block (based on the cache), just "to protect the cache from concurrency problems". Effect should be that you effectively use just one thread, as all the others are waiting to acquire the lock on the cache.
    Solution: synchronize only around cache operations (e.g. use `java.util.collections.synchronizedMap())

  • Synchronize all tiny little pieces of code. This should kill performance, probably prevent normal parallel execution. If you are lucky/smart enough you can come up with a dead lock also. Moral of this: synchronization should not be an ad hoc thing, on an "it will not hurt" basis, but a well thought thing.

Bonus exercise:

Fill up your cache at the beginning and don't do too much allocation afterward, but still have a small leak somewhere. Usually this pattern is not too easy to catch. You can use a "bookmark", or "watermark" feature of the profiler, which should be created right after the caching is done.

like image 193
Sandor Murakozi Avatar answered Oct 04 '22 15:10

Sandor Murakozi


Don't ignore this method because it works very well for any language and OS, for these reasons. An example is here. Also, try to use examples with I/O and significant call depth. Don't just use little cpu-bound programs like Mandelbrot. If you take that C example, which isn't too large, and recode it in Java, that should illustrate most of your points.

Let's see:

  • Resolve memory leaks.
    The whole point of a garbage collector is to plug memory leaks. However, you can still allocate too much memory, and that shows up as a large percent of time in "new" for some objects.

  • Optimize loops.
    Generally loops don't need to be optimized unless there's very little done inside them (and they take a good percent of time).

  • Optimize object creation and management.
    The basic approach here is: keep data structure as simple as humanly possible. Especially stay away from notification-style attempts to keep data consistent, because those things run away and make the call tree enormously bushy. This is a major reason for performance problems in big software.

  • Optimize string operations.
    Use string builder, but don't sweat code that doesn't use a solid percent of execution time.

  • Concurrency.
    Concurrency has two purposes.
    1) Performance, but this only works to the extent that it allows multiple pieces of hardware to get cranking at the same time. If the hardware isn't there, it doesn't help. It hurts.
    2) Clarity of expression, so for example UI code doesn't have to worry about heavy calculation or network I/O going on at the same time.

In any case, it can't be emphasized enough, don't do any optimization before you've proved that something takes a significant percent of time.

like image 33
Mike Dunlavey Avatar answered Oct 04 '22 14:10

Mike Dunlavey