I was profiling a Java application and discovered that object allocations were happening considerably slower than I'd expect. I ran a simple benchmark to attempt to establish the overall speed of small-object allocations, and I found that allocating a small object (a vector of 3 floats) seems to take about 200 nanoseconds on my machine. I'm running on a (dual-core) 2.0 GHz processor, so this is roughly 400 CPU cycles. I wanted to ask people here who have profiled Java applications before whether that sort of speed is to be expected. It seems a little cruel and unusual to me. After all, I would think that a language like Java that can compact the heap and relocate objects would have object allocation look something like the following:
int obj_addr = heap_ptr;
heap_ptr += some_constant_size_of_object
return obj_addr;
....which is a couple lines of assembly. As for garbage collection, I don't allocate or discard enough objects for that to come into play. When I optimize my code by re-using objects, I get performance on the order of 15 nanoseconds / object I need to process instead of 200 ns per object I need to process, so re-using objects hugely improves performance. I'd really like to not reuse objects because that makes notation kind of hairy (many methods need to accept a receptacle
argument instead of returning a value).
So the question is: is it normal that object allocation is taking so long? Or might something be wrong on my machine that, once fixed, might allow me to have better performance on this? How long do small-object allocations typically take for others, and is there a typical value? I'm using a client machine and not using any compile flags at the moment. If things are faster on your machine, what is your machine's JVM version and operating system?
I realize that individual mileage may vary greatly when it comes to performance, but I'm just asking whether the numbers I'm mentioning above seem like they're in the right ballpark.
Creating objects is very fast when the object is small and there is no GC cost.
final int batch = 1000 * 1000;
Double[] doubles = new Double[batch];
long start = System.nanoTime();
for (int j = 0; j < batch; j++)
doubles[j] = (double) j;
long time = System.nanoTime() - start;
System.out.printf("Average object allocation took %.1f ns.%n", (double) time/batch);
prints with -verbosegc
Average object allocation took 13.0 ns.
Note: no GCs occurred. However increase the size, and the program needs to wait to copy memory around in the GC.
final int batch = 10 *1000 * 1000;
prints
[GC 96704K->94774K(370496K), 0.0862160 secs]
[GC 191478K->187990K(467200K), 0.4135520 secs]
[Full GC 187990K->187974K(618048K), 0.2339020 secs]
Average object allocation took 78.6 ns.
I suspect your allocation is relatively slow because you are performing GCs. One way around this is to increase the memory available to the application. (Though this may just delay the cost)
If I run it again with -verbosegc -XX:NewSize=1g
Average object allocation took 9.1 ns.
I don't know how you measure the allocation time. It is probably inlined at least the equivalent of
intptr_t obj_addr = heap_ptr;
heap_ptr += CONSTANT_SIZE;
if (heap_ptr > young_region_limit)
call_the_garbage_collector ();
return obj_addr;
But it is more complex than that, because you have to fill the obj_addr
; then, some JIT compilation or class loading may happen; and very probably, the first few words are initialized (e.g. to the class pointer and to the hash code, which may involve some random number generation...), and the object constructors are called. They may require synchronization, etc.
And more importantly, a freshly allocated object is perhaps not in the nearest level-one cache, so some cache misses may happen.
So while I am not a Java expert, I am not suprized by your measures. I do believe that allocating fresh objects make your code cleaner and more maintainable, than trying to reuse older objects.
Yes. The difference between what you think it should do and what it actually does can be pretty large. Pooling may be messy, but when allocation and garbage collection is a large fraction of execution time, which it certainly can be, pooling is a big win, performance-wise.
The objects to pool are the ones you most often find it in the process of allocating, via stack samples.
Here's what such a sample looks like in C++. In Java the details are different, but the idea's the same:
... blah blah system stuff ...
MSVCRTD! 102129f9()
MSVCRTD! 1021297f()
operator new() line 373 + 22 bytes
operator new() line 65 + 19 bytes
COpReq::Handler() line 139 + 17 bytes <----- here is the line that's doing it
doit() line 346 + 12 bytes
main() line 367
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 7c817077()
V------ and that line shows what's being allocated
COperation* pOp = new COperation(iNextOp++, jobid);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With