What are the "best practices" for creating (and releasing) millions of small objects? I am writing a chess program in Java and the search algorithm generates a single "Move" object for each possible move, and a nominal search can easily generate over a million move objects per second. The JVM GC has been able to handle the load on my development system, but I'm interested in exploring alternative approaches that would: <ol> <li>Minimize the overhead of garbage collection, and </li> <li>reduce the peak memory footprint for lower-end systems. </li> </ol> A vast majority of the objects are very short-lived, but about 1% of the moves generated are persisted and returned as the persisted value, so any pooling or caching technique would have to provide the ability to exclude specific objects from being re-used. I don't expect fully-fleshed out example code, but I would appreciate suggestions for further reading/research, or open source examples of a similar nature.

Run the application with verbose garbage collection: <pre class="prettyprint"><code>java -verbose:gc </code></pre> And it will tell you when it collects. There would be two types of sweeps, a fast and a full sweep. <pre class="prettyprint"><code>[GC 325407K->83000K(776768K), 0.2300771 secs] [GC 325816K->83372K(776768K), 0.2454258 secs] [Full GC 267628K->83769K(776768K), 1.8479984 secs] </code></pre> The arrow is before and after size. As long as it is just doing GC and not a full GC you are home safe. The regular GC is a copy collector in the 'young generation', so objects that are no longer referenced are simply just forgotten about, which is exactly what you would want. Reading Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning is probably helpful.

Well, there are several questions in one here ! 1 - How are short-lived objects managed ? As previously stated, the JVM can perfectly deal with a huge amount of short lived object, since it follows the Weak Generational Hypothesis. Note that we are speaking of objects that reached the main memory (heap). This is not always the case. A lot of objects you create does not even leave a CPU register. For instance, consider this for-loop <pre class="prettyprint"><code>for(int i=0, i<max, i++) { // stuff that implies i } </code></pre> Let's not think about loop unrolling (an optimisations that the JVM heavily performs on your code). If <code>max</code> is equal to <code>Integer.MAX_VALUE</code>, you loop might take some time to execute. However, the <code>i</code> variable will never escape the loop-block. Therefore the JVM will put that variable in a CPU register, regularly increment it but will never send it back to the main memory. So, creating millions of objects are not a big deal if they are used only locally. They will be dead before being stored in Eden, so the GC won't even notice them. 2 - Is it useful to reduce the overhead of the GC ? As usual, it depends. First, you should enable GC logging to have a clear view about what is going on. You can enable it with <code>-Xloggc:gc.log -XX:+PrintGCDetails</code>. If your application is spending a lot of time in a GC cycle, then, yes, tune the GC, otherwise, it might not be really worth it. For instance, if you have a young GC every 100ms that takes 10ms, you spend 10% of your time in the GC, and you have 10 collections per second (which is huuuuuge). In such a case, I would not spend any time in GC tuning, since those 10 GC/s would still be there. 3 - Some experience I had a similar problem on an application that was creating a huge amount of a given class. In the GC logs, I noticed that the creation rate of the application was around 3 GB/s, which is way too much (come on... 3 gigabytes of data every second ?!). The problem : Too many frequent GC caused by too many objects being created. In my case, I attached a memory profiler and noticed that a class represented a huge percentage of all my objects. I tracked down the instantiations to find out that this class was basically a pair of booleans wrapped in an object. In that case, two solutions were available : <ul> <li>Rework the algorithm so that I do not return a pair of booleans but instead I have two methods that return each boolean separately</li> <li>Cache the objects, knowing that there were only 4 different instances</li> </ul> I chose the second one, as it had the least impact on the application and was easy to introduce. It took me minutes to put a factory with a not-thread-safe cache (I did not need thread safety since I would eventually have only 4 different instances). The allocation rate went down to 1 GB/s, and so did the frequency of young GC (divided by 3). Hope that helps !

One solution I've used for such search algorithms is to create just one Move object, mutate it with new move, and then undo the move before leaving the scope. You are probably analyzing just one move at a time, and then just storing the best move somewhere. If that's not feasible for some reason, and you want to decrease peak memory usage, a good article about memory efficiency is here: http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

Best practice for creating millions of small temporary objects

Tags:

java

garbage-collection

What are the "best practices" for creating (and releasing) millions of small objects?

I am writing a chess program in Java and the search algorithm generates a single "Move" object for each possible move, and a nominal search can easily generate over a million move objects per second. The JVM GC has been able to handle the load on my development system, but I'm interested in exploring alternative approaches that would:

Minimize the overhead of garbage collection, and
reduce the peak memory footprint for lower-end systems.

A vast majority of the objects are very short-lived, but about 1% of the moves generated are persisted and returned as the persisted value, so any pooling or caching technique would have to provide the ability to exclude specific objects from being re-used.

I don't expect fully-fleshed out example code, but I would appreciate suggestions for further reading/research, or open source examples of a similar nature.

941

asked May 07 '13 12:05

Humble Programmer

9 Answers

Run the application with verbose garbage collection:

java -verbose:gc

And it will tell you when it collects. There would be two types of sweeps, a fast and a full sweep.

[GC 325407K->83000K(776768K), 0.2300771 secs]
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K(776768K), 1.8479984 secs]

The arrow is before and after size.

As long as it is just doing GC and not a full GC you are home safe. The regular GC is a copy collector in the 'young generation', so objects that are no longer referenced are simply just forgotten about, which is exactly what you would want.

Reading Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning is probably helpful.

142

answered Sep 27 '22 22:09

Niels Bech Nielsen

Since version 6, the server mode of JVM employs an escape analysis technique. Using it you can avoid GC all together.

answered Sep 27 '22 22:09

Mikhail

Well, there are several questions in one here !

1 - How are short-lived objects managed ?

As previously stated, the JVM can perfectly deal with a huge amount of short lived object, since it follows the Weak Generational Hypothesis.

Note that we are speaking of objects that reached the main memory (heap). This is not always the case. A lot of objects you create does not even leave a CPU register. For instance, consider this for-loop

for(int i=0, i<max, i++) {
  // stuff that implies i
}

Let's not think about loop unrolling (an optimisations that the JVM heavily performs on your code). If max is equal to Integer.MAX_VALUE, you loop might take some time to execute. However, the i variable will never escape the loop-block. Therefore the JVM will put that variable in a CPU register, regularly increment it but will never send it back to the main memory.

So, creating millions of objects are not a big deal if they are used only locally. They will be dead before being stored in Eden, so the GC won't even notice them.

2 - Is it useful to reduce the overhead of the GC ?

As usual, it depends.

First, you should enable GC logging to have a clear view about what is going on. You can enable it with -Xloggc:gc.log -XX:+PrintGCDetails.

If your application is spending a lot of time in a GC cycle, then, yes, tune the GC, otherwise, it might not be really worth it.

For instance, if you have a young GC every 100ms that takes 10ms, you spend 10% of your time in the GC, and you have 10 collections per second (which is huuuuuge). In such a case, I would not spend any time in GC tuning, since those 10 GC/s would still be there.

3 - Some experience

I had a similar problem on an application that was creating a huge amount of a given class. In the GC logs, I noticed that the creation rate of the application was around 3 GB/s, which is way too much (come on... 3 gigabytes of data every second ?!).

The problem : Too many frequent GC caused by too many objects being created.

In my case, I attached a memory profiler and noticed that a class represented a huge percentage of all my objects. I tracked down the instantiations to find out that this class was basically a pair of booleans wrapped in an object. In that case, two solutions were available :

Rework the algorithm so that I do not return a pair of booleans but instead I have two methods that return each boolean separately
Cache the objects, knowing that there were only 4 different instances

I chose the second one, as it had the least impact on the application and was easy to introduce. It took me minutes to put a factory with a not-thread-safe cache (I did not need thread safety since I would eventually have only 4 different instances).

The allocation rate went down to 1 GB/s, and so did the frequency of young GC (divided by 3).

Hope that helps !

answered Sep 27 '22 21:09

Pierre Laporte

If you have just value objects (that is, no references to other objects) and really but I mean really tons and tons of them, you can use direct ByteBuffers with native byte ordering [the latter is important] and you need some few hundred lines of code to allocate/reuse + getter/setters. Getters look similar to long getQuantity(int tupleIndex){return buffer.getLong(tupleInex+QUANTITY_OFFSSET);}

That would solve the GC problem almost entirely as long as you do allocate once only, that is, a huge chunk and then manage the objects yourself. Instead of references you'd have only index (that is, int) into the ByteBuffer that has to be passed along. You may need to do the memory align yourself as well.

The technique would feel like using C and void*, but with some wrapping it's bearable. A performance downside could be bounds checking if the compiler fails to eliminate it. A major upside is the locality if you process the tuples like vectors, the lack of the object header reduces the memory footprint as well.

Other than that, it's likely you'd not need such an approach as the young generation of virtually all JVM dies trivially and the allocation cost is just a pointer bump. Allocation cost can be a bit higher if you use final fields as they require memory fence on some platforms (namely ARM/Power), on x86 it is free, though.

answered Sep 27 '22 23:09

3 revs, 2 users 69%

Assuming you find GC is an issue (as others point out it might not be) you will be implementing your own memory management for you special case i.e. a class which suffers massive churn. Give object pooling a go, I've seen cases where it works quite well. Implementing object pools is a well trodden path so no need to re-visit here, look out for:

multi-threading: using thread local pools might work for your case
backing data structure: consider using ArrayDeque as it performs well on remove and has no allocation overhead
limit the size of your pool :)

Measure before/after etc,etc

answered Sep 27 '22 23:09

Nitsan Wakart

I've met a similar problem. First of all, try to reduce the size of the small objects. We introduced some default field values referencing them in each object instance.

For example, MouseEvent has a reference to Point class. We cached Points and referenced them instead of creating new instances. The same for, for example, empty strings.

Another source was multiple booleans which were replaced with one int and for each boolean we use just one byte of the int.

answered Sep 27 '22 21:09

StanislavL

I dealt with this scenario with some XML processing code some time ago. I found myself creating millions of XML tag objects which were very small (usually just a string) and extremely short-lived (failure of an XPath check meant no-match so discard).

I did some serious testing and came to the conclusion that I could only achieve about a 7% improvement on speed using a list of discarded tags instead of making new ones. However, once implemented I found that the free queue needed a mechanism added to prune it if it got too big - this completely nullified my optimisation so I switched it to an option.

In summary - probably not worth it - but I'm glad to see you are thinking about it, it shows you care.

answered Sep 27 '22 23:09

OldCurmudgeon

Given that you are writing a chess program there are some special techniques you can use for decent performance. One simple approach is to create a large array of longs (or bytes) and treat it as a stack. Each time your move generator creates moves it pushes a couple of numbers onto the stack, e.g. move from square and move to square. As you evaluate the search tree you will be popping off moves and updating a board representation.

If you want expressive power use objects. If you want speed (in this case) go native.

answered Sep 27 '22 21:09

David Plumpton

One solution I've used for such search algorithms is to create just one Move object, mutate it with new move, and then undo the move before leaving the scope. You are probably analyzing just one move at a time, and then just storing the best move somewhere.

If that's not feasible for some reason, and you want to decrease peak memory usage, a good article about memory efficiency is here: http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

answered Sep 27 '22 21:09

rkj

Related questions
                            
                                Multiple queries executed in java in single statement
                            
                                Access string.xml Resource File from Java Android Code
                            
                                What is the difference between <jsp:include page = ... > and <%@ include file = ... >? [duplicate]
                            
                                File changed listener in Java
                            
                                Getting all names in an enum as a String[]
                            
                                How can I iterate through the unicode codepoints of a Java String?
                            
                                How do I move a file from one location to another in Java?
                            
                                How can I truncate a double to only two decimal places in Java?
                            
                                How to obtain the location of cacerts of the default java installation?
                            
                                Maven resource filtering not working - because of spring boot dependency [duplicate]
                            
                                wait until all threads finish their work in java
                            
                                What is the purpose of Serialization in Java?
                            
                                How to check if a string starts with one of several prefixes?
                            
                                Most efficient way to make the first character of a String lower case?
                            
                                Java 8 Stream with batch processing
                            
                                Garbage collector in Android
                            
                                How do I open the SearchView programmatically?
                            
                                Why does this method print 4?
                            
                                Working POST Multipart Request with Volley and without HttpEntity
                            
                                What is the maximum depth of the java call stack?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best practice for creating millions of small temporary objects

Tags:

java

garbage-collection

Humble Programmer

People also ask

9 Answers

Niels Bech Nielsen

Mikhail

Pierre Laporte

3 revs, 2 users 69%

Nitsan Wakart

StanislavL

OldCurmudgeon

David Plumpton

rkj

Recent Activity

Donate For Us