Reffering to : http://www.pixeldonor.com/2013/oct/12/concurrent-zip-compression-java-nio/
I'm trying to unzip 5GB zipped file , average it takes me about 30 min and it is a lot for our app , I'm trying to reduce time.
I've tried a lot of combination , changed buffer size (by default my write chunk is 4096 bytes) , changed NIO methods , libraries , all results are pretty the same.
One thing still didn't try is to split zipped files by chunks , so read it by multithread chunks.
The snippet code is:
private static ExecutorService e = Executors.newFixedThreadPool(20);
public static void main(String argv[]) {
try {
String selectedZipFile = "/Users/xx/Documents/test123/large.zip";
String selectedDirectory = "/Users/xx/Documents/test2";
long st = System.currentTimeMillis();
unzip(selectedDirectory, selectedZipFile);
System.out.println(System.currentTimeMillis() - st);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void unzip(String targetDir, String zipFilename) {
ZipInputStream archive;
try {
List<ZipEntry> list = new ArrayList<>();
archive = new ZipInputStream(new BufferedInputStream(new FileInputStream(zipFilename)));
ZipEntry entry;
while ((entry = archive.getNextEntry()) != null) {
list.add(entry);
}
for (List<ZipEntry> partition : Lists.partition(list, 1000)) {
e.submit(new Multi(targetDir, partition, archive));
}
} catch (Exception e){
e.printStackTrace();
}
}
and the runnable is :
static class Multi implements Runnable {
private List<ZipEntry> partition;
private ZipInputStream zipInputStream;
private String targetDir;
public Multi(String targetDir, List<ZipEntry> partition, ZipInputStream zipInputStream) {
this.partition = partition;
this.zipInputStream = zipInputStream;
this.targetDir = targetDir;
}
@Override
public void run() {
for (ZipEntry entry : partition) {
File entryDestination = new File(targetDir, entry.getName());
if (entry.isDirectory()) {
entryDestination.mkdirs();
} else {
entryDestination.getParentFile().mkdirs();
BufferedOutputStream output = null;
try {
int n;
byte buf[] = new byte[BUFSIZE];
output = new BufferedOutputStream(new FileOutputStream(entryDestination), BUFSIZE);
while ((n = zipInputStream.read(buf, 0, BUFSIZE)) != -1) {
output.write(buf, 0, n);
}
output.flush();
} catch (FileNotFoundException e1) {
e1.printStackTrace();
} catch (IOException e1) {
e1.printStackTrace();
} finally {
try {
output.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
}
}
}
}
But for reason it stores only directories without files content...
My Question is: what is the right way to make chunks with multithread over large zip file regarding the way of the "compression" article mentioned above?
To unzip a zip file, we need to read the zip file with ZipInputStream and then read all the ZipEntry one by one. Then use FileOutputStream to write them to file system. We also need to create the output directory if it doesn't exists and any nested directories present in the zip file.
Zip4j is written on JDK 8, as some of the features (NIO) that Zip4j supports requires features available only in JDK 8. However, considering the fact that Zip4j is widely used in Android, and to support older versions of Android, Zip4j supports JDK 7 as well.
A recipient can unzip (or extract) a ZIP file after transport and use the file in the original format. 1. Single Zip file Let's see how we can zip one file in Java using the core Java libraries java.util.zip package. When we open the created zip file, we found the text file. 2. Multiple Zip Files
This method guards against writing files to the file system outside of the target folder. This vulnerability is called Zip Slip, and we can read more about it here. 6. Conclusion In this article, we illustrated how to use Java libraries for zipping and unzipping files.
ZIP is a common file format that compresses one or more files into a single location. It reduces the file size and makes it easier to transport or store. A recipient can unzip (or extract) a ZIP file after transport and use the file in the original format. 1. Single Zip file
Using InputStream and BufferedInputStream, we read the uncompressed bytes into a byte buffer to then use FileOutputStream to write it to a file. Keep doing it until whole file is processed.
A ZipInputStream
is a single stream of data, it cannot be split.
If you want multi-threaded unzipping, you need to use ZipFile
. With Java 8 you even get the multi-threading for free.
public static void unzip(String targetDir, String zipFilename) {
Path targetDirPath = Paths.get(targetDir);
try (ZipFile zipFile = new ZipFile(zipFilename)) {
zipFile.stream()
.parallel() // enable multi-threading
.forEach(e -> unzipEntry(zipFile, e, targetDirPath));
} catch (IOException e) {
throw new RuntimeException("Error opening zip file '" + zipFilename + "': " + e, e);
}
}
private static void unzipEntry(ZipFile zipFile, ZipEntry entry, Path targetDir) {
try {
Path targetPath = targetDir.resolve(Paths.get(entry.getName()));
if (Files.isDirectory(targetPath)) {
Files.createDirectories(targetPath);
} else {
Files.createDirectories(targetPath.getParent());
try (InputStream in = zipFile.getInputStream(entry)) {
Files.copy(in, targetPath, StandardCopyOption.REPLACE_EXISTING);
}
}
} catch (IOException e) {
throw new RuntimeException("Error processing zip entry '" + entry.getName() + "': " + e, e);
}
}
You might also want to check out this answer, which uses FileSystem
to access the zip file content, for a true Java 8 experience.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With