Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Benchmarking Zlib Java vs. C

Tags:

java

c

zlib

deflate

I'm trying to speed up my TIFF encoder initially written in Java by switching to C and have compiled Zlib 1.2.8 with Z_SOLO defined and minimum set of C files: adler32.c, crc32.c, deflate.c, trees.c and zutil.c. Java is using java.util.zip.Deflater.

I wrote a simple test program to assess performance in terms of compression level and speed and was puzzled by the fact that whatever level I require, compression does not kick in that much, considering the increasing amount of time required by higher levels. I was also amazed by Java actually performing better than a Visual Studio Release-compile (VC2010) in both compression and speed:

Java:

Level 1 : 8424865 => 6215200 (73,8%) in 247 cycles.
Level 2 : 8424865 => 6178098 (73,3%) in 254 cycles.
Level 3 : 8424865 => 6181716 (73,4%) in 269 cycles.
Level 4 : 8424865 => 6337236 (75,2%) in 334 cycles.
Level 5 : 8424865 => 6331902 (75,2%) in 376 cycles.
Level 6 : 8424865 => 6333914 (75,2%) in 395 cycles.
Level 7 : 8424865 => 6333350 (75,2%) in 400 cycles.
Level 8 : 8424865 => 6331986 (75,2%) in 437 cycles.
Level 9 : 8424865 => 6331598 (75,2%) in 533 cycles.

C:

Level 1 : 8424865 => 6215586 (73.8%) in 298 cycles.
Level 2 : 8424865 => 6195280 (73.5%) in 309 cycles.
Level 3 : 8424865 => 6182748 (73.4%) in 331 cycles.
Level 4 : 8424865 => 6337942 (75.2%) in 406 cycles.
Level 5 : 8424865 => 6339203 (75.2%) in 457 cycles.
Level 6 : 8424865 => 6337100 (75.2%) in 481 cycles.
Level 7 : 8424865 => 6336396 (75.2%) in 492 cycles.
Level 8 : 8424865 => 6334293 (75.2%) in 547 cycles.
Level 9 : 8424865 => 6333084 (75.2%) in 688 cycles.

Am I the only one witnessing such results? My guess is that the Zlib in the JVM is using assembly-type optimizations that I'm not including in my C project, or I am missing an obvious configuration step when compiling Zlib (or Visual Studio compiler sucks).

Here are both snippets:

Java:

public static void main(String[] args) throws IOException {
    byte[] pix = Files.readAllBytes(Paths.get("MY_MOSTLY_UNCOMPRESSED.TIFF"));
    int szin = pix.length;
    byte[] buf = new byte[szin*101/100];
    int szout;
    long t0, t1;

    for (int i = 1; i <= 9; i++) {
        t0 = System.currentTimeMillis();
        Deflater deflater = new Deflater(i);
        deflater.setInput(pix);
        szout = deflater.deflate(buf);
        deflater.finish();
        t1 = System.currentTimeMillis();
        System.out.println(String.format("Level %d : %d => %d (%.1f%%) in %d cycles.", i, szin, szout, 100.0f*szout/szin, t1 - t0));
    }
}

C:

#include <time.h>
#define SZIN 9000000
#define SZOUT 10000000
void main(void)
{
    static unsigned char buf[SZIN];
    static unsigned char out[SZOUT];
    clock_t t0, t1;
    int i, ret;
    uLongf sz, szin;
    FILE* f = fopen("MY_MOSTLY_UNCOMPRESSED.TIFF", "rb");
    szin = fread(buf, 1, SZIN, f);
    fclose(f);

    for (i = 1; i <= 9; i++) {
        sz = SZOUT;
        t0 = clock();
        compress2(out, &sz, buf, szin, i); // I rewrote compress2, as it's not available when Z_SOLO is defined
        t1 = clock();
        printf("Level %d : %d => %d (%.1f%%) in %ld cycles.\n", i, szin, sz, 100.0f*sz/szin, t1 - t0);
    }
}

EDIT:

After @MarkAdler's remarks, I tried different compressing strategies through deflateInit2() (namely Z_FILTERED and Z_HUFFMAN_ONLY):

Z_FILTERED:

Level 1 : 8424865 => 6215586 (73.8%) in 299 cycles.
Level 2 : 8424865 => 6195280 (73.5%) in 310 cycles.
Level 3 : 8424865 => 6182748 (73.4%) in 330 cycles.
Level 4 : 8424865 => 6623409 (78.6%) in 471 cycles.
Level 5 : 8424865 => 6604616 (78.4%) in 501 cycles.
Level 6 : 8424865 => 6595698 (78.3%) in 528 cycles.
Level 7 : 8424865 => 6594845 (78.3%) in 536 cycles.
Level 8 : 8424865 => 6592863 (78.3%) in 595 cycles.
Level 9 : 8424865 => 6591118 (78.2%) in 741 cycles.

Z_HUFFMAN_ONLY:

Level 1 : 8424865 => 6803043 (80.7%) in 111 cycles.
Level 2 : 8424865 => 6803043 (80.7%) in 108 cycles.
Level 3 : 8424865 => 6803043 (80.7%) in 106 cycles.
Level 4 : 8424865 => 6803043 (80.7%) in 106 cycles.
Level 5 : 8424865 => 6803043 (80.7%) in 107 cycles.
Level 6 : 8424865 => 6803043 (80.7%) in 106 cycles.
Level 7 : 8424865 => 6803043 (80.7%) in 107 cycles.
Level 8 : 8424865 => 6803043 (80.7%) in 108 cycles.
Level 9 : 8424865 => 6803043 (80.7%) in 107 cycles.

As expected per his comment, Z_HUFFMAN_ONLY does not change compression but performs a lot faster. With my data, Z_FILTERED was not faster and a little worse compression than the Z_DEFAULT_STRATEGY.

like image 781
Matthieu Avatar asked Jul 24 '14 21:07

Matthieu


1 Answers

The compression amounts and deltas are not surprising for uncompressed image data with essentially no matching strings. The portion of the data that is compressed is not being compressed further -- it is being expanded slightly by some constant amount, so the variation is all on the uncompressed portion.

There is a change in algorithm between level 3 and 4, where level 3 goes for the first match it finds. When there are hardly no matching strings, that will tend to minimize the overhead of sending string matches, hence the better compression. You might do better still if string matching were turned off completely using FILTERED or HUFFMAN_ONLY. HUFFMAN_ONLY also has the advantage of not even looking for matching strings, speeding up the compression significantly.

As for the speed difference, I can only guess that different compilers or different compiler optimizations were used.

like image 53
Mark Adler Avatar answered Oct 15 '22 05:10

Mark Adler