Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine the optimal size for array with respect to the JVM's memory granularity

When creating the backing array for (e.g.) a collection, you do not really care about the exact size of the array you create, it only needs to be at least as large as you calculated.

But thanks to the memory allocation and the VM's array header, it would in some cases be possible to create a somewhat larger array without consuming any more memory - for the Oracle 32 bit VM (at least thats what several sources on the internet claim), memory granularity is 8 (meaning any memory allocation is rounded up to the next 8 byte-boundary), and array header overhead is 12 bytes.

That means when allocating Object[2], that should consume 20 bytes (12 + 2 * 4), but it will actually take 24 bytes thanks to granularity. It would be possible to create an Object[3] for just the same memory cost, meaning a collection would have to resize its backing array a little later. The same principle could be applied to primitve arrays, e.g. byte[] used for I/O buffers, char[] in string builder etc.

While such an optimization won't have a really noticeable effect, except under the most extreme circumstances, it wouldn't be much trouble to call a static method to "optimze" an array size.

Problem is, there is no such "round array size up to memory granularity" in the JDK. And writing such a method myself would require to determine some crucial parameters of the VM: memory granularity, array header overhead and finally the size of each type (mainly a problem for references, since their size can vary with architecture and VM options).

So is there a method to determine these parameters, or achieve the desired "round up" by other means?

like image 441
Durandal Avatar asked Apr 22 '14 14:04

Durandal


2 Answers

Interesting idea. I think that the more portable method of determining this would be to actually measure usage. Example program:

public class FindMemoryUsage {
    public static void main(String[] args) {
        for (int i=0; i<50; i+=2) {
            long actual = getActualUsageForN(i);
            System.out.println(i + " = " + actual);
            long theoretical = getTheoreticalUsageForN(i);
            if (theoretical != actual) {
                throw new RuntimeException("Uh oh! Mismatch!");
            }
        }
    }

    private static long getTheoreticalUsageForN(long count) {
        long optimal = (Unsafe.ARRAY_BYTE_BASE_OFFSET + Unsafe.ARRAY_BYTE_INDEX_SCALE * count);
        return ((optimal - 1) & ~7) + 8;
    }

    private static long getActualUsageForN(int count) {
        System.gc();
        byte[][] arrays = new byte[3000000][];
        long begin = usedMemory();
        for (int i=0; i<arrays.length; i++) {
            arrays[i] = new byte[count];
        }
        long end = usedMemory();
        return Math.round((end - begin) / (double) arrays.length);
    }

    private static long usedMemory() {
        return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
    }
}

This program gives you this info:

0 = 16
2 = 16
4 = 16
6 = 24
8 = 24
10 = 24
12 = 24
14 = 32
16 = 32
18 = 32
20 = 32
22 = 40
24 = 40
26 = 40
28 = 40
30 = 48
32 = 48
34 = 48
36 = 48
38 = 56
40 = 56
42 = 56
44 = 56
46 = 64
48 = 64

This data is from both the actual calculation of usage and the theoretical usage based on sun.misc.Unsafe's constants and 8-byte-rounding. This means that you could use these constants to "round up" like you suggested:

private static int roundSizeUp(int from) {
    long size = (Unsafe.ARRAY_BYTE_BASE_OFFSET + Unsafe.ARRAY_BYTE_INDEX_SCALE * from);
    long actual = ((size - 1) & ~7) + 8;
    return (int) (actual - Unsafe.ARRAY_BYTE_BASE_OFFSET) / Unsafe.ARRAY_BYTE_INDEX_SCALE;
}

This is VM-specific code, but you could probably find how to do this based on the getActualUsageForN strategy if you need more portability.

Note that this isn't production-quality code: you'd want to think carefully about overflows and change the Unsafe references to be the constants that actually apply to the type of array that you're working with.

like image 58
Cel Skeggs Avatar answered Sep 29 '22 09:09

Cel Skeggs


When dynamically sized collections increase the size of their backing array, they do not add a small amount to its size, they increase in proportion. Doubling is a common choice. They do this because it gives better performance. The tiny adjustment you suggest would not be worth the effort.

like image 35
Raedwald Avatar answered Sep 29 '22 09:09

Raedwald