Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I allocate objects contiguously in java?

Assume I have a large array of relatively small objects, which I need to iterate frequently.
I would like to optimize my iteration by improving cache performance, so I would like to allocate the objects [and not the reference] contiguously on the memory, so I'll get fewer cache misses, and the overall performance could be segnificantly better.

In C++, I could just allocate an array of the objects, and it will allocate them as I wanted, but in java - when allocating an array, I only allocate the reference, and the allocation is being done one object at a time.

I am aware that if I allocate the objects "at once" [one after the other], the jvm is most likely to allocate the objects as contiguous as it can, but it might be not enough if the memory is fragmented.

My questions:

  1. Is there a way to tell the jvm to defrag the memory just before I start allocating my objects? Will it be enough to ensure [as much as possible] that the objects will be allocated continiously?
  2. Is there a different solution to this issue?
like image 796
amit Avatar asked Mar 09 '12 10:03

amit


1 Answers

New objects are creating in the Eden space. The eden space is never fragmented. It is always empty after a GC.

The problem you have is when a GC is performed, object can be arranged randomly in memory or even surprisingly in the reverse order they are referenced.

A work around is to store the fields as a series of arrays. I call this a column-based table instead of a row based table.

e.g. Instead of writing

class PointCount {
    double x, y;
    int count;
}

PointCount[] pc = new lots of small objects.

use columns based data types.

class PointCounts {
    double[] xs, ys;
    int[] counts;
}

or

class PointCounts {
    TDoubleArrayList xs, ys;
    TIntArrayList counts;
}

The arrays themselves could be in up to three different places, but the data is otherwise always continuous. This can even be marginally more efficient if you perform operations on a subset of fields.

public int totalCount() {
   int sum = 0;
   // counts are continuous without anything between the values.
   for(int i: counts) sum += i;
   return i;
}

A solution I use is to avoid GC overhead for having large amounts of data is to use an interface to access a direct or memory mapped ByteBuffer

import java.nio.ByteBuffer;
import java.nio.ByteOrder;

public class MyCounters {
    public static void main(String... args) {
        Runtime rt = Runtime.getRuntime();
        long used1 = rt.totalMemory() - rt.freeMemory();
        long start = System.nanoTime();
        int length = 100 * 1000 * 1000;
        PointCount pc = new PointCountImpl(length);
        for (int i = 0; i < length; i++) {
            pc.index(i);
            pc.setX(i);
            pc.setY(-i);
            pc.setCount(1);
        }
        for (int i = 0; i < length; i++) {
            pc.index(i);
            if (pc.getX() != i) throw new AssertionError();
            if (pc.getY() != -i) throw new AssertionError();
            if (pc.getCount() != 1) throw new AssertionError();
        }
        long time = System.nanoTime() - start;
        long used2 = rt.totalMemory() - rt.freeMemory();
        System.out.printf("Creating an array of %,d used %,d bytes of heap and tool %.1f seconds to set and get%n",
                length, (used2 - used1), time / 1e9);
    }
}

interface PointCount {
    // set the index of the element referred to.
    public void index(int index);

    public double getX();

    public void setX(double x);

    public double getY();

    public void setY(double y);

    public int getCount();

    public void setCount(int count);

    public void incrementCount();
}

class PointCountImpl implements PointCount {
    static final int X_OFFSET = 0;
    static final int Y_OFFSET = X_OFFSET + 8;
    static final int COUNT_OFFSET = Y_OFFSET + 8;
    static final int LENGTH = COUNT_OFFSET + 4;

    final ByteBuffer buffer;
    int start = 0;

    PointCountImpl(int count) {
        this(ByteBuffer.allocateDirect(count * LENGTH).order(ByteOrder.nativeOrder()));
    }

    PointCountImpl(ByteBuffer buffer) {
        this.buffer = buffer;
    }

    @Override
    public void index(int index) {
        start = index * LENGTH;
    }

    @Override
    public double getX() {
        return buffer.getDouble(start + X_OFFSET);
    }

    @Override
    public void setX(double x) {
        buffer.putDouble(start + X_OFFSET, x);
    }

    @Override
    public double getY() {
        return buffer.getDouble(start + Y_OFFSET);
    }

    @Override
    public void setY(double y) {
        buffer.putDouble(start + Y_OFFSET, y);
    }

    @Override
    public int getCount() {
        return buffer.getInt(start + COUNT_OFFSET);
    }

    @Override
    public void setCount(int count) {
        buffer.putInt(start + COUNT_OFFSET, count);
    }

    @Override
    public void incrementCount() {
        setCount(getCount() + 1);
    }
}

run with the -XX:-UseTLAB option (to get accurate memory allocation sizes) prints

Creating an array of 100,000,000 used 12,512 bytes of heap and took 1.8 seconds to set and get

As its off heap, it has next to no GC impact.

like image 197
Peter Lawrey Avatar answered Sep 28 '22 13:09

Peter Lawrey