Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a huge zip file into multiple volumes?

Tags:

java

zip

When I create a zip Archive via java.util.zip.*, is there a way to split the resulting archive in multiple volumes?

Let's say my overall archive has a filesize of 24 MB and I want to split it into 3 files on a limit of 10 MB per file.
Is there a zip API which has this feature? Or any other nice ways to achieve this?

Thanks Thollsten

like image 528
Thollsten Avatar asked Oct 28 '08 16:10

Thollsten


2 Answers

Check: http://saloon.javaranch.com/cgi-bin/ubb/ultimatebb.cgi?ubb=get_topic&f=38&t=004618

I am not aware of any public API that will help you do that. (Although if you do not want to do it programatically, there are utilities like WinSplitter that will do it)

I have not tried it but, every ZipEntry while using ZippedInput/OutputStream has a compressed size. You may get a rough estimate of the size of the zipped file while creating it. If you need 2MB of zipped files, then you can stop writing to a file after the cumulative size of entries become 1.9MB, taking .1MB for Manifest file and other zip file specific elements. So, in a nutshell, you can write a wrapper over the ZippedInputStream as follows:

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

public class ChunkedZippedOutputStream {

    private ZipOutputStream zipOutputStream;

    private final String path;
    private final String name;

    private long currentSize;
    private int currentChunkIndex;
    private final long MAX_FILE_SIZE = 16000000; // Whatever size you want
    private final String PART_POSTFIX = ".part.";
    private final String FILE_EXTENSION = ".zip";

    public ChunkedZippedOutputStream(String path, String name) throws FileNotFoundException {
        this.path = path;
        this.name = name;
        constructNewStream();
    }

    public void addEntry(ZipEntry entry) throws IOException {
        long entrySize = entry.getCompressedSize();
        if ((currentSize + entrySize) > MAX_FILE_SIZE) {
            closeStream();
            constructNewStream();
        } else {
            currentSize += entrySize;
            zipOutputStream.putNextEntry(entry);
        }
    }

    private void closeStream() throws IOException {
        zipOutputStream.close();
    }

    private void constructNewStream() throws FileNotFoundException {
        zipOutputStream = new ZipOutputStream(new FileOutputStream(new File(path, constructCurrentPartName())));
        currentChunkIndex++;
        currentSize = 0;
    }

    private String constructCurrentPartName() {
        // This will give names is the form of <file_name>.part.0.zip, <file_name>.part.1.zip, etc.
        return name + PART_POSTFIX + currentChunkIndex + FILE_EXTENSION;
    }
}

The above program is just a hint of the approach and not a final solution by any means.

like image 189
sakana Avatar answered Sep 22 '22 14:09

sakana


If the goal is to have the output be compatible with pkzip and winzip, I'm not aware of any open source libraries that do this. We had a similar requirement for one of our apps, and I wound up writing our own implementation (compatible with the zip standard). If I recall, the hardest thing for us was that we had to generate the individual files on the fly (the way that most zip utilities work is they create the big zip file, then go back and split it later - that's a lot easier to implement. Took about a day to write and 2 days to debug.

The zip standard explains what the file format has to look like. If you aren't afraid of rolling up your sleeves a bit, this is definitely doable. You do have to implement a zip file generator yourself, but you can use Java's Deflator class to generate the segment streams for the compressed data. You'll have to generate the file and section headers yourself, but they are just bytes - nothing too hard once you dive in.

Here's the zip specification - section K has the info you are looking for specifically, but you'll need to read A, B, C and F as well. If you are dealing with really big files (We were), you'll have to get into the Zip64 stuff as well - but for 24 MB, you are fine.

If you want to dive in and try it - if you run into questions, post back and I'll see if I can provide some pointers.

like image 25
Kevin Day Avatar answered Sep 22 '22 14:09

Kevin Day