Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most Compact way to Serialize an Array of Longs in Java

I have many arrays of longs long[] and I need to serialize them and save them to disk for later read, notice that each array has to be modified from time to time, but writes are infrequent while reads are frequent. Usually my application needs only a small number of those loaded into memory at the same time. Edits to each array can happen in batch in memory before the array are stored back to disk. Each array has from hundreds up to a million elements. In my application it is critical that loading the desired arrays into memory happens very fast.

In my case the long values in each array are, on average, quite near one to the other, i.e., the difference from one value to the next - if sorted within a single array- is less than an integer.

The solution adopting a trie-like structure as presented here seems not applicable for my case, since in that solution the array values are known and never change.

This solution here tells me to use ByteBuffer and LongBuffer to speed up I/O, but my idea would be to also store such arrays in the most compact way in order to speed-up the time needed to load them into main memory by reducing the size of what I need to read. The intuition is to store the values sorted and store the difference between one value and the next one, which is - on average- within the Integer range and as such take less space. But since this is not always true, I still cannot store all the values as integers, so this direction doesn't seem promising. Am I missing something obvious?

What is the most efficient way, in I/O time, to achieve this?


EDIT In general, regarding performance as solely I/O time, without considering space on disk, this question has better answers.

like image 568
Kuzeko Avatar asked Sep 02 '25 13:09

Kuzeko


1 Answers

You may still encode array elements as ints with the following addition:

    // The first int is the array length
    buf.putInt(array.length);

    long prev = 0;
    for (long next : array) {
        if (next - prev <= Integer.MAX_VALUE) {
            // Delta is small. Change the sign and encode as int.
            buf.putInt((int) (prev - next));
        } else {
            // Delta does not fit in 31 bits. Encode two parts of long.
            buf.putInt((int) (next >>> 32));
            buf.putInt((int) next);
        }
        prev = next;
    }

Note that 31-bit delta will be encoded as negative int. During decoding the highest (sign) bit will tell if the value is delta or a raw 63-bit long. In the latter case you read next int and compose a 63-bit long from two ints:

    // The first int is the array length
    long[] array = new long[buf.getInt()];

    long next = 0;
    for (int i = 0; i < array.length; i++) {
        int delta = buf.getInt();
        if (delta <= 0) {
            // Negative sign means the value is encoded as int delta.
            next -= delta;
        } else {
            // Positive sign means the value is encoded as raw long.
            // Read the second (lower) part of long and combine it with the higher part.
            next = (long) delta << 32 | (buf.getInt() & 0xffffffffL);
        }
        array[i] = next;
    }

This works if all values in the array are positive. If there are both positive and negative values, split them into two arrays.


By the way, a streaming compression like GZIP (or a faster alternative like LZ4) will also work well if the neighbour values are close. See GZIPOutputStream.

like image 162
apangin Avatar answered Sep 05 '25 03:09

apangin