Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are ZipFile InputStreams thread safe?

I cannot find any statement that specifies whether it would be safe to get multiple InputStreams (from multiple ZipEntry's) and process each in its own thread.

Would this be safe to attempt?

Would it be advisable?

Added

Might I get better performance this way?

like image 400
OldCurmudgeon Avatar asked Mar 07 '12 11:03

OldCurmudgeon


People also ask

Is ZipFile thread-safe?

No, it is not thread-safe in that sense. If you're appending to the same zip file, you'd need a lock there, or the file contents could get scrambled. If you're appending to different zip files, using separate ZipFile() objects, then you're fine. Show activity on this post.

Are private variables thread-safe?

Ans. Yes for sure, you are creating 101 logical threads (1 main thread + 100 other by calling start() method of thread).

Is Fileinputstream thread-safe?

The implementation of those operations is thread-safe, if (and only if) all threads use the same SynchronizedInputStream object to access a given InputStream , and nothing apart from your wrapper access the InputStream directly.

What are thread-safe libraries?

A thread-safe routine is one that can be called concurrently from multiple threads without undesirable interactions between threads. A routine can be thread safe for either of the following reasons: It is inherently reentrant. It uses thread-specific data or lock on mutexes.


2 Answers

Reading should be OK. Each stream contains its own state, so you can open multiple streams that point to the same file and read from them concurrently.

But simultaneous writing is wrong. It will create mismatch in your file.

like image 116
AlexR Avatar answered Oct 20 '22 13:10

AlexR


ZipFile InputStreams should be threadsafe, but the ZipFile API itself is instance-synchronized (internally all the reading/writing methods, including for reading metadata, are isolated using synchronized (this)), so ZipFile instances can be accessed by only one thread at a time.

If you want multiple threads to read from the same zipfile in a scalable way, you must open one ZipFile instance per thread, with each thread reading from separate InputStreams, each one derived from a different ZipEntry. That way, the per-thread lock in the ZipFile methods does not block all but one thread from reading from the zipfile at one time. It also means that when each thread closes the ZipFile after they're done reading, they close their own instance, not the shared instance, so you don't get an exception on the second and subsequent close.

Protip: if you really care about speed, and you need multiple threads reading from the same ZipFile, you can get more performance by reading all the ZipEntry objects from the first ZipFile instance, and sharing them with all threads, to avoid duplicating work in reading the zipfile central directory for each thread separately. A ZipEntry object is not tied to a specific ZipFile instance per se, ZipEntry just records metadata that will work with any ZipFile object representing the same zipfile that the ZipEntry came from. So this is the recipe for scaling up ZipFile usage in Java:

  1. Open N ZipFile instances on the file, one for each of the N worker threads.
  2. For just one of the ZipFile instances, read all the ZipEntry objects, and store in a list.
  3. Distribute the list of ZipEntry objects to each of the N worker threads, along with the thread's own unique ZipFile instance.
  4. In parallel, within each thread, open an InputStream on the thread's ZipFile instance, using whatever ZipEntry objects you want that thread to open (e.g. you could instruct each thread to open just one of the files). Or you could put all the ZipEntry objects into a concurrent queue or parallel stream, and all the worker threads could consume entries from the queue, if you want just one thread to open each ZipEntry, while load-leveling as well as possible.
  5. When each thread has finished, it should close its own ZipFile instance.
like image 41
Luke Hutchison Avatar answered Oct 20 '22 13:10

Luke Hutchison