Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zipping a huge folder by using a ZipFileSystem results in OutOfMemoryError

Tags:

java

zip

nio

The java.nio package has a beautiful way of handling zip files by treating them as file systems. This enables us to treat zip file contents like usual files. Thus, zipping a whole folder can be achieved by simply using Files.copy to copy all the files into the zip file. Since subfolders are to be copied as well, we need a visitor:

 private static class CopyFileVisitor extends SimpleFileVisitor<Path> {
    private final Path targetPath;
    private Path sourcePath = null;
    public CopyFileVisitor(Path targetPath) {
        this.targetPath = targetPath;
    }

    @Override
    public FileVisitResult preVisitDirectory(final Path dir,
    final BasicFileAttributes attrs) throws IOException {
        if (sourcePath == null) {
            sourcePath = dir;
        } else {
        Files.createDirectories(targetPath.resolve(sourcePath
                    .relativize(dir).toString()));
        }
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(final Path file,
    final BasicFileAttributes attrs) throws IOException {
    Files.copy(file,
        targetPath.resolve(sourcePath.relativize(file).toString()), StandardCopyOption.REPLACE_EXISTING);
    return FileVisitResult.CONTINUE;
    }
}

This is a simple "copy directory recursively" visitor. It is used to copy a directory recursively. However, with the ZipFileSystem, we can also use it to copy a directory into a zip file, like this:

public static void zipFolder(Path zipFile, Path sourceDir) throws ZipException, IOException
{
    // Initialize the Zip Filesystem and get its root
    Map<String, String> env = new HashMap<>();
    env.put("create", "true");
    URI uri = URI.create("jar:" + zipFile.toUri());       
    FileSystem fileSystem = FileSystems.newFileSystem(uri, env);
    Iterable<Path> roots = fileSystem.getRootDirectories();
    Path root = roots.iterator().next();

    // Simply copy the directory into the root of the zip file system
    Files.walkFileTree(sourceDir, new CopyFileVisitor(root));
}

This is what I call an elegant way of zipping a whole folder. However, when using this method on a huge folder (around 3 GB) I receive an OutOfMemoryError (heap space). When using a usual zip handling library, this error is not raised. Thus, it seems that the way the ZipFileSystem handles the copy is very inefficient: Too much of the files to be written is kept in memory so the OutOfMemoryError occurs.

Why is this the case? Is using ZipFileSystem generally considered inefficient (in terms of memory consumption) or am I doing something wrong here?

like image 213
gexicide Avatar asked May 25 '14 18:05

gexicide


1 Answers

I looked at ZipFileSystem.java and I believe I found the source of the memory consumption. By default, the implementation is using ByteArrayOutputStream as the buffer to compress the files, which means that it's limited by the amount of memory assigned to the JVM.

There's an (undocumented) environment variable we can use to make the implementation use temporary files ("useTempFile"). It works like this:

Map<String, Object> env = new HashMap<>();
env.put("create", "true");
env.put("useTempFile", Boolean.TRUE);

More details here: http://www.docjar.com/html/api/com/sun/nio/zipfs/ZipFileSystem.java.html, interesting lines are 96, 1358 and 1362.

like image 137
Diego Giagio Avatar answered Sep 23 '22 18:09

Diego Giagio