Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I unzip huge folder with multithreading with java - preferred java8?

Reffering to : http://www.pixeldonor.com/2013/oct/12/concurrent-zip-compression-java-nio/

I'm trying to unzip 5GB zipped file , average it takes me about 30 min and it is a lot for our app , I'm trying to reduce time.

I've tried a lot of combination , changed buffer size (by default my write chunk is 4096 bytes) , changed NIO methods , libraries , all results are pretty the same.

One thing still didn't try is to split zipped files by chunks , so read it by multithread chunks.

The snippet code is:

  private static ExecutorService e = Executors.newFixedThreadPool(20);
  public static void main(String argv[]) {
        try {
            String selectedZipFile = "/Users/xx/Documents/test123/large.zip";
            String selectedDirectory = "/Users/xx/Documents/test2";
            long st = System.currentTimeMillis();

            unzip(selectedDirectory, selectedZipFile);

            System.out.println(System.currentTimeMillis() - st);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }


public static void unzip(String targetDir, String zipFilename) {
    ZipInputStream archive;
            try {
                List<ZipEntry> list = new ArrayList<>();
                archive = new ZipInputStream(new BufferedInputStream(new FileInputStream(zipFilename)));
                ZipEntry entry;
                while ((entry = archive.getNextEntry()) != null) {
                    list.add(entry);
                }

                for (List<ZipEntry> partition : Lists.partition(list, 1000)) {
                    e.submit(new Multi(targetDir, partition, archive));
                }
            } catch (Exception e){
                e.printStackTrace();
            }
}

and the runnable is :

  static class Multi implements Runnable {

    private List<ZipEntry> partition;
    private ZipInputStream zipInputStream;
    private String targetDir;

    public Multi(String targetDir, List<ZipEntry> partition, ZipInputStream zipInputStream) {
        this.partition = partition;
        this.zipInputStream = zipInputStream;
        this.targetDir = targetDir;
    }

    @Override
    public void run() {
        for (ZipEntry entry : partition) {
            File entryDestination = new File(targetDir, entry.getName());
            if (entry.isDirectory()) {
                entryDestination.mkdirs();
            } else {
                entryDestination.getParentFile().mkdirs();

                BufferedOutputStream output = null;
                try {
                    int n;
                    byte buf[] = new byte[BUFSIZE];
                    output = new BufferedOutputStream(new FileOutputStream(entryDestination), BUFSIZE);
                    while ((n = zipInputStream.read(buf, 0, BUFSIZE)) != -1) {
                        output.write(buf, 0, n);
                    }
                    output.flush();


                } catch (FileNotFoundException e1) {
                    e1.printStackTrace();
                } catch (IOException e1) {
                    e1.printStackTrace();
                } finally {

                    try {
                        output.close();
                    } catch (IOException e1) {
                        e1.printStackTrace();
                    }

                }
            }
        }
    }
}

But for reason it stores only directories without files content...

My Question is: what is the right way to make chunks with multithread over large zip file regarding the way of the "compression" article mentioned above?

like image 607
VitalyT Avatar asked Aug 19 '18 19:08

VitalyT


People also ask

How to unzip a folder in Java?

To unzip a zip file, we need to read the zip file with ZipInputStream and then read all the ZipEntry one by one. Then use FileOutputStream to write them to file system. We also need to create the output directory if it doesn't exists and any nested directories present in the zip file.

What is a good Java library to zip unzip files?

Zip4j is written on JDK 8, as some of the features (NIO) that Zip4j supports requires features available only in JDK 8. However, considering the fact that Zip4j is widely used in Android, and to support older versions of Android, Zip4j supports JDK 7 as well.

How to unzip a zip file in Java?

A recipient can unzip (or extract) a ZIP file after transport and use the file in the original format. 1. Single Zip file Let's see how we can zip one file in Java using the core Java libraries java.util.zip package. When we open the created zip file, we found the text file. 2. Multiple Zip Files

What is ZIP slip vulnerability in Java?

This method guards against writing files to the file system outside of the target folder. This vulnerability is called Zip Slip, and we can read more about it here. 6. Conclusion In this article, we illustrated how to use Java libraries for zipping and unzipping files.

What is a zip file and how to open it?

ZIP is a common file format that compresses one or more files into a single location. It reduces the file size and makes it easier to transport or store. A recipient can unzip (or extract) a ZIP file after transport and use the file in the original format. 1. Single Zip file

How to write uncompressed bytes to a file in Java?

Using InputStream and BufferedInputStream, we read the uncompressed bytes into a byte buffer to then use FileOutputStream to write it to a file. Keep doing it until whole file is processed.


1 Answers

A ZipInputStream is a single stream of data, it cannot be split.

If you want multi-threaded unzipping, you need to use ZipFile. With Java 8 you even get the multi-threading for free.

public static void unzip(String targetDir, String zipFilename) {
    Path targetDirPath = Paths.get(targetDir);
    try (ZipFile zipFile = new ZipFile(zipFilename)) {
        zipFile.stream()
               .parallel() // enable multi-threading
               .forEach(e -> unzipEntry(zipFile, e, targetDirPath));
    } catch (IOException e) {
        throw new RuntimeException("Error opening zip file '" + zipFilename + "': " + e, e);
    }
}

private static void unzipEntry(ZipFile zipFile, ZipEntry entry, Path targetDir) {
    try {
        Path targetPath = targetDir.resolve(Paths.get(entry.getName()));
        if (Files.isDirectory(targetPath)) {
            Files.createDirectories(targetPath);
        } else {
            Files.createDirectories(targetPath.getParent());
            try (InputStream in = zipFile.getInputStream(entry)) {
                Files.copy(in, targetPath, StandardCopyOption.REPLACE_EXISTING);
            }
        }
    } catch (IOException e) {
        throw new RuntimeException("Error processing zip entry '" + entry.getName() + "': " + e, e);
    }
}

You might also want to check out this answer, which uses FileSystem to access the zip file content, for a true Java 8 experience.

like image 69
Andreas Avatar answered Oct 23 '22 16:10

Andreas