I cannot find any statement that specifies whether it would be safe to get multiple InputStream
s (from multiple ZipEntry
's) and process each in its own thread.
Would this be safe to attempt?
Would it be advisable?
Added
Might I get better performance this way?
No, it is not thread-safe in that sense. If you're appending to the same zip file, you'd need a lock there, or the file contents could get scrambled. If you're appending to different zip files, using separate ZipFile() objects, then you're fine. Show activity on this post.
Ans. Yes for sure, you are creating 101 logical threads (1 main thread + 100 other by calling start() method of thread).
The implementation of those operations is thread-safe, if (and only if) all threads use the same SynchronizedInputStream object to access a given InputStream , and nothing apart from your wrapper access the InputStream directly.
A thread-safe routine is one that can be called concurrently from multiple threads without undesirable interactions between threads. A routine can be thread safe for either of the following reasons: It is inherently reentrant. It uses thread-specific data or lock on mutexes.
Reading should be OK. Each stream contains its own state, so you can open multiple streams that point to the same file and read from them concurrently.
But simultaneous writing is wrong. It will create mismatch in your file.
ZipFile
InputStreams should be threadsafe, but the ZipFile
API itself is instance-synchronized (internally all the reading/writing methods, including for reading metadata, are isolated using synchronized (this)
), so ZipFile
instances can be accessed by only one thread at a time.
If you want multiple threads to read from the same zipfile in a scalable way, you must open one ZipFile
instance per thread, with each thread reading from separate InputStreams, each one derived from a different ZipEntry
. That way, the per-thread lock in the ZipFile
methods does not block all but one thread from reading from the zipfile at one time. It also means that when each thread closes the ZipFile
after they're done reading, they close their own instance, not the shared instance, so you don't get an exception on the second and subsequent close.
Protip: if you really care about speed, and you need multiple threads reading from the same ZipFile
, you can get more performance by reading all the ZipEntry
objects from the first ZipFile instance, and sharing them with all threads, to avoid duplicating work in reading the zipfile central directory for each thread separately. A ZipEntry
object is not tied to a specific ZipFile
instance per se, ZipEntry
just records metadata that will work with any ZipFile
object representing the same zipfile that the ZipEntry
came from. So this is the recipe for scaling up ZipFile
usage in Java:
ZipFile
instances on the file, one for each of the N
worker threads.ZipFile
instances, read all the ZipEntry
objects, and store in a list.ZipEntry
objects to each of the N
worker threads, along with the thread's own unique ZipFile
instance.InputStream
on the thread's ZipFile
instance, using whatever ZipEntry
objects you want that thread to open (e.g. you could instruct each thread to open just one of the files). Or you could put all the ZipEntry
objects into a concurrent queue or parallel stream, and all the worker threads could consume entries from the queue, if you want just one thread to open each ZipEntry
, while load-leveling as well as possible.ZipFile
instance.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With