Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write "compressed" Array to increase IO performance?

I have an int and float array each of length 220 million (fixed). Now, I want to store/upload those arrays to/from memory and disk. Currently, I am using Java NIO's FileChannel and MappedByteBuffer to solve this. It works fine, but it takes near about 5 seconds (Wall Clock Time) for storing/uploading array to/from memory to disk. Now, I want to make it faster.

Here, I should mention most of those array elements are 0 ( nearly 52 %).

like:

int arr1 [] = { 0 , 0 , 6 , 7 , 1, 0 , 0 ...}

Can anybody help me, is there any nice way to improve speed by not storing or loading those 0's. This can compensated by using Arrays.fill (array , 0).

like image 298
Arpssss Avatar asked Jun 28 '12 17:06

Arpssss


People also ask

Is Brotli better than gzip?

Brotli has a better compression ratio (i.e. it produces smaller compressed files) across every level of compression. While GZIP does beat Brotli on speed most of the time, the level you compress at factors into the results you'll see.

How much can data be compressed?

Data compression can reduce a text file to 50% or a significantly higher percentage of its original size. For data transmission, compression can be performed on the data content or on the entire transmission unit, including header data.

Can you compress compressed data?

Can you compress a file twice? You can run a compression program on the same file more than once. If you are lucky, the second time will result in the file being the same size it was before the second compression. More than 95% of the time, the result will be a larger file.


1 Answers

The following approach requires n / 8 + nz * 4 bytes on disk, where n is the size of the array, and nz the number of non-zero entries. For 52% zero entries, you'd reduce storage size by 52% - 3% = 49%.

You could do:

void write(int[] array) {
    BitSet zeroes = new BitSet();
    for (int i = 0; i < array.length; i++)
        zeroes.set(i, array[i] == 0);
    write(zeroes); // one bit per index
    for (int i = 0; i < array.length; i++)
        if (array[i] != 0)
            write(array[y]);
}

int[] read() {
    BitSet zeroes = readBitSet();
    array = new int[zeroes.length];
    for (int i = 0; i < zeroes.length; i++) {
        if (zeroes.get(i)) {
            // nothing to do (array[i] was initialized to 0)
        } else {
            array[i] = readInt();
        }
    }
}

Edit: That you say this is slightly slower implies that the disk is not the bottleneck. You could tune the above approach by writing the bitset as you construct it, so you don't have to write the bitset to memory before writing it to disk. Also, by writing the bitset word by word interspersed with the actual data we can do only a single pass over the array, reducing cache misses:

void write(int[] array) {
    writeInt(array.length);
    int ni;
    for (int i = 0; i < array.length; i = ni) {
        ni = i + 32;
        int zeroesMap = 0;
        for (j = i + 31; j >= i; j--) {
            zeroesMap <<= 1;
            if (array[j] == 0) {
                zeroesMap |= 1;
            }
        }
        writeInt(zeroesMap);
        for (j = i; j < ni; j++)
            if (array[j] != 0) {
                writeInt(array[j]);
            }
        }
    }
}

int[] read() {
    int[] array = new int[readInt()];
    int ni;
    for (int i = 0; i < array.length; i = ni) {
        ni = i + 32;
        zeroesMap = readInt();
        for (j = i; j < ni; j++) {
            if (zeroesMap & 1 == 1) {
                // nothing to do (array[i] was initialized to 0)
            } else {
                array[j] = readInt();
            }
            zeroesMap >>= 1;
        }
    }
    return array;
}

(The preceeding code assumes array.length is a multiple of 32. If not, write the last slice of the array in whatever way you like)

If that doesn't reduce proceccing time either, compression is not the way to go (I don't think any general purpose compression algorithm will be faster than the above).

like image 95
meriton Avatar answered Nov 09 '22 12:11

meriton