Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

At what point is it worth reusing arrays in Java?

How big does a buffer need to be in Java before it's worth reusing?

Or, put another way: I can repeatedly allocate, use, and discard byte[] objects OR run a pool to keep and reuse them. I might allocate a lot of small buffers that get discarded often, or a few big ones that's don't. At what size is is cheaper to pool them than to reallocate, and how do small allocations compare to big ones?

EDIT:

Ok, specific parameters. Say an Intel Core 2 Duo CPU, latest VM version for OS of choice. This questions isn't as vague as it sounds... a little code and a graph could answer it.

EDIT2:

You've posted a lot of good general rules and discussions, but the question really asks for numbers. Post 'em (and code too)! Theory is great, but the proof is the numbers. It doesn't matter if results vary some from system to system, I'm just looking for a rough estimate (order of magnitude). Nobody seems to know if the performance difference will be a factor of 1.1, 2, 10, or 100+, and this is something that matters. It is important for any Java code working with big arrays -- networking, bioinformatics, etc.

Suggestions to get a good benchmark:

  1. Warm up code before running it in the benchmark. Methods should all be called at least 1000 10000 times to get full JIT optimization.
  2. Make sure benchmarked methods run for at least 1 10 seconds and use System.nanotime if possible, to get accurate timings.
  3. Run benchmark on a system that is only running minimal applications
  4. Run benchmark 3-5 times and report all times, so we see how consistent it is.

I know this is a vague and somewhat demanding question. I will check this question regularly, and answers will get comments and rated up consistently. Lazy answers will not (see below for criteria). If I don't have any answers that are thorough, I'll attach a bounty. I might anyway, to reward a really good answer with a little extra.

What I know (and don't need repeated):

  • Java memory allocation and GC are fast and getting faster.
  • Object pooling used to be a good optimization, but now it hurts performance most of the time.
  • Object pooling is "not usually a good idea unless objects are expensive to create." Yadda yadda.

What I DON'T know:

  • How fast should I expect memory allocations to run (MB/s) on a standard modern CPU?
  • How does allocation size effect allocation rate?
  • What's the break-even point for number/size of allocations vs. re-use in a pool?

Routes to an ACCEPTED answer (the more the better):

  • A recent whitepaper showing figures for allocation & GC on modern CPUs (recent as in last year or so, JVM 1.6 or later)
  • Code for a concise and correct micro-benchmark I can run
  • Explanation of how and why the allocations impact performance
  • Real-world examples/anecdotes from testing this kind of optimization

The Context:

I'm working on a library adding LZF compression support to Java. This library extends the H2 DBMS LZF classes, by adding additional compression levels (more compression) and compatibility with the byte streams from the C LZF library. One of the things I'm thinking about is whether or not it's worth trying to reuse the fixed-size buffers used to compress/decompress streams. The buffers may be ~8 kB, or ~32 kB, and in the original version they're ~128 kB. Buffers may be allocated one or more times per stream. I'm trying to figure out how I want to handle buffers to get the best performance, with an eye toward potentially multithreading in the future.

Yes, the library WILL be released as open source if anyone is interested in using this.

like image 367
BobMcGee Avatar asked Dec 23 '09 21:12

BobMcGee


People also ask

Which is the advantage of reusing existing objects?

Advantages of reuse: You are not discarding as much memory, requires less garbage collection. You only need to update objects that have changed.

How do you reuse an array in Java?

You can either create a new array and shift all the elements of previous array to the new array with different size. I would recommend using ArrayList for dynamic array. If you get in any such situation.

Can objects be reused in Java?

A feature of Java that makes it an extremely useful language is the ability to reuse objects.


2 Answers

If you want a simple answer, it is that there is no simple answer. No amount of calling answers (and by implication people) "lazy" is going to help.

How fast should I expect memory allocations to run (MB/s) on a standard modern CPU?

At the speed at which the JVM can zero memory, assuming that the allocation does not trigger a garbage collection. If it does trigger garbage collection, it is impossible to predict without knowing what GC algorithm is used, the heap size and other parameters, and an analysis of the application's working set of non-garbage objects over the lifetime of the app.

How does allocation size effect allocation rate?

See above.

What's the break-even point for number/size of allocations vs. re-use in a pool?

If you want a simple answer, it is that there is no simple answer.

The golden rule is, the bigger your heap is (up to the amount of physical memory available), the smaller the amortized cost of GC'ing a garbage object. With a fast copying garbage collector, the amortized cost of freeing a garbage object approaches zero as the heap gets larger. The cost of the GC is actually determined by (in simplistic terms) the number and size of non-garbage objects that the GC has to deal with.

Under the assumption that your heap is large, the lifecycle cost of allocating and GC'ing a large object (in one GC cycle) approaches the cost of zeroing the memory when the object is allocated.

EDIT: If all you want is some simple numbers, write a simple application that allocates and discards large buffers and run it on your machine with various GC and heap parameters and see what happens. But beware that this is not going to give you a realistic answer because real GC costs depend on an application's non-garbage objects.

I'm not going to write a benchmark for you because I know that it would give you bogus answers.

EDIT 2: In response to the OP's comments.

So, I should expect allocations to run about as fast as System.arraycopy, or a fully JITed array initialization loop (about 1GB/s on my last bench, but I'm dubious of the result)?

Theoretically yes. In practice, it is difficult to measure in a way that separates the allocation costs from the GC costs.

By heap size, are you saying allocating a larger amount of memory for JVM use will actually reduce performance?

No, I'm saying it is likely to increase performance. Significantly. (Provided that you don't run into OS-level virtual memory effects.)

Allocations are just for arrays, and almost everything else in my code runs on the stack. It should simplify measuring and predicting performance.

Maybe. Frankly, I think that you are not going to get much improvement by recycling buffers.

But if you are intent on going down this path, create a buffer pool interface with two implementations. The first is a real thread-safe buffer pool that recycles buffers. The second is dummy pool which simply allocates a new buffer each time alloc is called, and treats dispose as a no-op. Finally, allow the application developer to choose between the pool implementations via a setBufferPool method and/or constructor parameters and/or runtime configuration properties. The application should also be able to supply a buffer pool class / instance of its own making.

like image 186
Stephen C Avatar answered Sep 17 '22 17:09

Stephen C


When it is larger than young space.

If your array is larger than the thread-local young space, it is directly allocated in the old space. Garbage collection on the old space is way slower than on the young space. So if your array is larger than the young space, it might make sense to reuse it.

On my machine, 32kb exceeds the young space. So it would make sense to reuse it.

like image 37
akuhn Avatar answered Sep 21 '22 17:09

akuhn