Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get Java to use my multi-core processor with GZIPInputStream?

I'm using a GZIPInputStream in my program, and I know that the performance would be helped if I could get Java running my program in parallel.

In general, is there a command-line option for the standard VM to run on many cores? It's running on just one as it is.

Thanks!

Edit

I'm running plain ol' Java SE 6 update 17 on Windows XP.

Would putting the GZIPInputStream on a separate thread explicitly help? No! Do not put the GZIPInputStream on a separate thread! Do NOT multithread I/O!

Edit 2

I suppose I/O is the bottleneck, as I'm reading and writing to the same disk...

In general, though, is there a way to make GZIPInputStream faster? Or a replacement for GZIPInputStream that runs parallel?

Edit 3 Code snippet I used:

GZIPInputStream gzip = new GZIPInputStream(new FileInputStream(INPUT_FILENAME));
DataInputStream in = new DataInputStream(new BufferedInputStream(gzip));
like image 698
Rudiger Avatar asked Jan 01 '10 21:01

Rudiger


3 Answers

AFAIK the action of reading from this stream is single-threaded, so multiple CPUs won't help you if you're reading one file.

You could, however, have multiple threads, each unzipping a different file.

That being said, unzipping is not particularly calculation intensive these days, you're more likely to be blocked by the cost of IO (e.g., if you are reading two very large files in two different areas of the HD).

More generally (assuming this is a question of someone new to Java), Java doesn't do things in parallel for you. You have to use threads to tell it what are the units of work that you want to do and how to synchronize between them. Java (with the help of the OS) will generally take as many cores as is available to it, and will also swap threads on the same core if there are more threads than cores (which is typically the case).

like image 133
Uri Avatar answered Nov 15 '22 19:11

Uri


PIGZ = Parallel Implementation of GZip is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data. http://www.zlib.net/pigz/ It's not Java yet--- any takers. Of course the world needs it in Java.

Sometimes the compression or decompression is a big CPU-consumer, though it helps the I/O not be the bottleneck.

See also Dataseries (C++) from HP Labs. PIGZ only parallelizes the compression, while Dataseries breaks the output into large compressed blocks, which are decompressible in parallel. Also has a number of other features.

like image 34
George Avatar answered Nov 15 '22 21:11

George


Wrap your GZIP streams in Buffered streams, this should give you a significant performance increase.

OutputStream out = new BufferedOutputStream(
    new GZIPOutputStream(
        new FileOutputStream(myFile)
    )
)

And likewise for the input stream. Using the buffered input/output streams reduces the number of disk reads.

like image 32
Sam Barnum Avatar answered Nov 15 '22 19:11

Sam Barnum