I have a 1GB zip file containing about 2000 textfiles. I want to read all files and all lines as fast as possible.
try (ZipFile zipFile = new ZipFile("file.zip")) {
zipFile.stream().parallel().forEach(entry -> readAllLines(entry)); //reading with BufferedReader.readLine();
}
Result: stream.parallel() is about 30-50% faster than a normal stream.
Question: could I optimize the performance even more if I'd not be reading the stream using the parallel
API, but firering my own threads explicit to read from the file?
Maybe. Keep in mind that switching threads is somewhat expensive and parallel()
of Java 8 is pretty good.
Uncompressing ZIP streams is CPU intensive, so more threads won't make things faster. If you create your own execution service where you carefully balance the number of threads with the number of cores, you might be able to find a better sweet spot than Java 8's parallel()
.
The other thing left is using a better buffering strategy for reading the file. But that's not easy for ZIP archives. You can try to use ZipInputStream
instead of ZipFile
but it's not so easy to mix the stream API with Java 8's Stream
API ((de)compressing files using NIO).
I recently met this problem, and solved by creating a ZipFile
instance for each worker thread, like the following.
List<String> entryNames;
try (ZipFile file = new ZipFile(path)) {
entryNames = file.stream().map(ZipEntry::getName).
collect(Collectors.toList());
}
Queue<ZipFile> files = new ConcurrentLinkedQueue<>();
ThreadLocal<ZipFile> ctx = ThreadLocal.withInitial(() -> {
try {
ZipFile file = new ZipFile(path);
files.add(file);
return file;
}
catch (IOException ignored) {
return null;
}
});
try {
entryNames.parallelStream().forEach(entryName -> {
try {
ZipFile file = ctx.get();
if (file == null) return;
ZipEntry entry = new ZipEntry(entryName);
if (entry.isDirectory()) return;
try (InputStream in = file.getInputStream(entry)) {
byte[] bytes = in.readAllBytes();
// process bytes
}
}
catch (IOException ignored) {}
});
}
finally {
for (ZipFile file : files) {
try { file.close(); } catch (IOException ignored) {}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With