Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tweaking java classes for CPU cache friendliness

When designing java classes, what are the recommendations for achieving CPU cache friendliness?

What I have learned so far is that one should use POD as much as possible (i.e. int instead of integer). Thus, the data will be allocated consecutively when allocating the containing object. E.g.

class Local
{
    private int data0;
    private int data1;
    // ...
};

is more cache friendly than

class NoSoLocal
{
    private Integer data0;
    private Integer data1;
    //...
};

The latter will require two separate allocations for the Integer objects that can be at arbitrary locations in memory, esp. after a GC run. OTOH the first approach might lead to duplication of data in cases where the data can be reused.

Is there a way to have them located close to each other in memory so that the parent object and its' containing elements will be in the CPU cache at once and not distributed arbitrarily over the whole memory plus the GC will keep them together?

like image 520
ogni42 Avatar asked Feb 12 '23 16:02

ogni42


2 Answers

You cannot force JVM to place related objects close to each other (though JVM tries to do it automatically). But there are certain tricks to make Java programs more cache-friendly.

Let me show you some examples from the real-life projects.

BEWARE! This is not a recommended way to code in Java!
Do not adopt the following techniques unless you are absolutely sure why you are doing it.

  1. Inheritance over composition. You've definitely heard the contrary principle "Favor composition over inheritance". But with composition you have an extra reference to follow. This is not good for cache locality and also requires more memory. The classic example of inheritance over composition is JDK 8 Adder and Accumulator classes which extend utility Striped64 class.

  2. Transform arrays of structures into a structure of arrays. This again helps to save memory and to speed-up bulk operations on a single field, e.g. key lookups:

    class Entry {
        long key;
        Object value;
    }
    
    Entry[] entries;
    

    will be replaced with

    long[] keys;
    Object[] values;
    
  3. Flatten data structures by inlining. My favorite example is inlining 160-bit SHA1 hash represented by byte[]. The code before:

    class Blob {
        long offset;
        int length;
        byte[] sha1_hash;
    }
    

    The code after:

    class Blob {
        long offset;
        int length;
        int hash0, hash1, hash2, hash3, hash4;
    }
    
  4. Replace String with char[]. You know, String in Java contains char[] object under the hood. Why pay performance penalty for an extra reference?

  5. Avoid linked lists. Linked lists are very cache-unfriendly. Hardware works best with linear structures. LinkedList can be often replaced with ArrayList. A classic HashMap may be replaced with an open address hash table.

  6. Use primitive collections. Trove is a high-performance library containg specialized lists, maps, sets etc. for primitive types.

  7. Build your own data layouts on top of arrays or ByteBuffers. A byte array is a perfect linear structure. To achieve the best cache locality you can pack an object data manually into a single byte array.

like image 120
apangin Avatar answered Feb 15 '23 05:02

apangin


the first approach might lead to duplication of data in cases where the data can be reused.

But not in the case you mention. An int is 4 bytes, a reference is typically 4-bytes so you don't gain anything by using Integer. For a more complex type, it can make a big difference however.

Is there a way to have them located close to each other in memory so that the parent object and its' containing elements will be in the CPU cache at once and not distributed arbitrarily over the whole memory plus the GC will keep them together?

The GC will do this anyway, provided the objects are only used in one place. If the objects are used in multiple places, they will be close to one reference.

Note: this is not guaranteed to be the case, however when allocating objects they will typically be continuous in memory as this is the simplest allocation strategy. When copying retained objects, the HotSpot GC will copy them in reverse order of discovery. i.e. they are still together but in reverse order.

Note 2: Using 4 bytes for an int is still going to be more efficient than using 28 bytes for an Integer (4 bytes for reference, 16 bytes for object header, 4 bytes for value and 4 bytes for padding)

Note 3: Above all, you should favour clarity over performance, unless you have measured you need and have a more performant solution. In this case, an int cannot be a null but an integer can be null. If you want a value which should not be null use int, not for performance but for clarity.

like image 32
Peter Lawrey Avatar answered Feb 15 '23 04:02

Peter Lawrey