Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java in-memory size optimization

I'm writing some "big data" software that needs to hold a lot of data in memory. I wrote a prototype in c++ that works great. However the actual end-users typically code in Java so they've asked me to also write a java prototype.

I've done background reading on memory-footprint in java and some preliminary tests. For example, lets say I have this object

public class DataPoint{

    int cents, time, product_id, store_id;

    public DataPoint(int cents, int time, int product_id, int store_id){
    this.cents = cents;
    this.time = time;
    this.product_id = product_id;
    this.store_id = store_id;
    }
}

In C++ the sizeof this structure is 16 bytes, which makes sense. In Java we have to be indirect. If I create, e.g., 10m of these objects and use Runtime.totalMemory() - Runtime.freeMemory() before and after and then divide as appropriate I get approximately 36 bytes per structure. A ~2.4x memory difference is pretty nasty; its gonna get ugly when we try to hold hundreds of millions of DataPoints in memory.

I read somewhere that in cases like this in Java its better to store the data as arrays -- essentially a column-based store rather than a row-based store. I think I understand this: the column-based way reduces the number of number of references, and perhaps the JVM can even pack the ints into 8-byte words intelligently.

What other tricks can I use for reducing the memory-footprint of what is essentially a memory block that has one very large dimension (millions/billions of datapoints) and one very small dimension (the O(1) number of columns/variables)?

Turns out storing the data as 4 int arrays used exactly 16 bytes per entry. The lesson: small objects have nasty proportional overhead in java.

like image 573
andyInCambridge Avatar asked Nov 03 '22 07:11

andyInCambridge


1 Answers

It isn't that straightforward to see how much memory your data structure takes in Java. totalMemory() shows the space allocated for vm which is larger than the actual usage. You could try using Java profiler that shows space-consumption of your data structures, they are quite easy to setup and run. One handy free tool is Java's own VisualVM that for example shows memory behaviour of your application, you will also learn a bit about how Java's GC works if you use it.

VisualVM screenshot showing performance footprint (image from http://visualvm.java.net/features.html): enter image description here

You should also consider making the variables final if it's possible. It allows Java VM to optimize the code bit better (not sure if it saves space though).

like image 92
Lycha Avatar answered Nov 10 '22 01:11

Lycha