Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compress directory to tar.gz with Commons Compress

I'm running into a problem using the commons compress library to create a tar.gz of a directory. I have a directory structure that is as follows.

parent/
    child/
        file1.raw
        fileN.raw

I'm using the following code to do the compression. It runs fine without exceptions. However, when I try to decompress that tar.gz, I get a single file with the name "childDirToCompress". Its the correct size so the files have clearly been appended to each other in the tarring process. The desired output would be a directory. I can't figure out what I'm doing wrong. Can any wise commons compresser set me upon the correct path?

CreateTarGZ() throws CompressorException, FileNotFoundException, ArchiveException, IOException {
            File f = new File("parent");
            File f2 = new File("parent/childDirToCompress");

            File outFile = new File(f2.getAbsolutePath() + ".tar.gz");
            if(!outFile.exists()){
                outFile.createNewFile();
            }
            FileOutputStream fos = new FileOutputStream(outFile);

            TarArchiveOutputStream taos = new TarArchiveOutputStream(new GZIPOutputStream(new BufferedOutputStream(fos)));
            taos.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_STAR); 
            taos.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
            addFilesToCompression(taos, f2, ".");
            taos.close();

        }

        private static void addFilesToCompression(TarArchiveOutputStream taos, File file, String dir) throws IOException{
            taos.putArchiveEntry(new TarArchiveEntry(file, dir));

            if (file.isFile()) {
                BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
                IOUtils.copy(bis, taos);
                taos.closeArchiveEntry();
                bis.close();
            }

            else if(file.isDirectory()) {
                taos.closeArchiveEntry();
                for (File childFile : file.listFiles()) {
                    addFilesToCompression(taos, childFile, file.getName());

                }
            }
        }
like image 855
awfulHack Avatar asked Nov 19 '12 20:11

awfulHack


People also ask

What is the use of Commons compress?

A new class TarFile provides random access to TAR archives. Commons Compress now ships with a copy of the Pack200 code of the retired Apache Harmony project. The pack200 support in Commons Compress no longer uses the implementation of the Java class library - and thus also works for Java 14 and later.

Can 7zip create tar gz?

The trick is that 7-Zip will only gzip a single file. So creating a tar. gz is a two step process. First create the tar archive, then use 7-Zip to select the tar and you will get an option to gzip it.

What is the use of Commons compress jar?

Apache Commons Compress software defines an API for working with compression and archive formats.


4 Answers

I followed this solution and it worked until I was processing a larger set of files and it randomly crashes after processing 15000 - 16000 files. the following line is leaking file handlers:

IOUtils.copy(new FileInputStream(f), tOut);

and the code crashed with a "Too many open files" error at the OS level The following minor change fix the problem:

FileInputStream in = new FileInputStream(f);
IOUtils.copy(in, tOut);
in.close();
like image 113
user3613365 Avatar answered Oct 19 '22 10:10

user3613365


I haven't figured out what exactly was going wrong but a scouring of google caches I found a working example. Sorry for the tumbleweed!

public void CreateTarGZ()
    throws FileNotFoundException, IOException
{
    try {
        System.out.println(new File(".").getAbsolutePath());
        dirPath = "parent/childDirToCompress/";
        tarGzPath = "archive.tar.gz";
        fOut = new FileOutputStream(new File(tarGzPath));
        bOut = new BufferedOutputStream(fOut);
        gzOut = new GzipCompressorOutputStream(bOut);
        tOut = new TarArchiveOutputStream(gzOut);
        addFileToTarGz(tOut, dirPath, "");
    } finally {
        tOut.finish();
        tOut.close();
        gzOut.close();
        bOut.close();
        fOut.close();
    }
}

private void addFileToTarGz(TarArchiveOutputStream tOut, String path, String base)
    throws IOException
{
    File f = new File(path);
    System.out.println(f.exists());
    String entryName = base + f.getName();
    TarArchiveEntry tarEntry = new TarArchiveEntry(f, entryName);
    tOut.putArchiveEntry(tarEntry);

    if (f.isFile()) {
        IOUtils.copy(new FileInputStream(f), tOut);
        tOut.closeArchiveEntry();
    } else {
        tOut.closeArchiveEntry();
        File[] children = f.listFiles();
        if (children != null) {
            for (File child : children) {
                System.out.println(child.getName());
                addFileToTarGz(tOut, child.getAbsolutePath(), entryName + "/");
            }
        }
    }
}
like image 39
awfulHack Avatar answered Oct 19 '22 11:10

awfulHack


I ended up doing the following:

public URL createTarGzip() throws IOException {
    Path inputDirectoryPath = ...
    File outputFile = new File("/path/to/filename.tar.gz");

    try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile);
            BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
            GzipCompressorOutputStream gzipOutputStream = new GzipCompressorOutputStream(bufferedOutputStream);
            TarArchiveOutputStream tarArchiveOutputStream = new TarArchiveOutputStream(gzipOutputStream)) {

        tarArchiveOutputStream.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_POSIX);
        tarArchiveOutputStream.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);

        List<File> files = new ArrayList<>(FileUtils.listFiles(
                inputDirectoryPath,
                new RegexFileFilter("^(.*?)"),
                DirectoryFileFilter.DIRECTORY
        ));

        for (int i = 0; i < files.size(); i++) {
            File currentFile = files.get(i);

            String relativeFilePath = new File(inputDirectoryPath.toUri()).toURI().relativize(
                    new File(currentFile.getAbsolutePath()).toURI()).getPath();

            TarArchiveEntry tarEntry = new TarArchiveEntry(currentFile, relativeFilePath);
            tarEntry.setSize(currentFile.length());

            tarArchiveOutputStream.putArchiveEntry(tarEntry);
            tarArchiveOutputStream.write(IOUtils.toByteArray(new FileInputStream(currentFile)));
            tarArchiveOutputStream.closeArchiveEntry();
        }
        tarArchiveOutputStream.close();
        return outputFile.toURI().toURL();
    }
}

This takes care of the some of the edge cases that come up in the other solutions.

like image 7
merrick Avatar answered Oct 19 '22 09:10

merrick


I had to make some adjustments to @merrick solution to get it to work related to the path. Perhaps with the latest maven dependencies. The currently accepted solution didn't work for me.

import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.io.filefilter.DirectoryFileFilter;
import org.apache.commons.io.filefilter.RegexFileFilter;

public class TAR {

    public static void CreateTarGZ(String inputDirectoryPath, String outputPath) throws IOException {

        File inputFile = new File(inputDirectoryPath);
        File outputFile = new File(outputPath);

        try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile);
                BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
                GzipCompressorOutputStream gzipOutputStream = new GzipCompressorOutputStream(bufferedOutputStream);
                TarArchiveOutputStream tarArchiveOutputStream = new TarArchiveOutputStream(gzipOutputStream)) {

            tarArchiveOutputStream.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_POSIX);
            tarArchiveOutputStream.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);

            List<File> files = new ArrayList<>(FileUtils.listFiles(
                    inputFile,
                    new RegexFileFilter("^(.*?)"),
                    DirectoryFileFilter.DIRECTORY
            ));

            for (int i = 0; i < files.size(); i++) {
                File currentFile = files.get(i);

                String relativeFilePath = inputFile.toURI().relativize(
                        new File(currentFile.getAbsolutePath()).toURI()).getPath();

                TarArchiveEntry tarEntry = new TarArchiveEntry(currentFile, relativeFilePath);
                tarEntry.setSize(currentFile.length());

                tarArchiveOutputStream.putArchiveEntry(tarEntry);
                tarArchiveOutputStream.write(IOUtils.toByteArray(new FileInputStream(currentFile)));
                tarArchiveOutputStream.closeArchiveEntry();
            }
            tarArchiveOutputStream.close();
        }
    }
}

Maven

        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.6</version>
        </dependency>

        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-compress</artifactId>
            <version>1.18</version>
        </dependency>
like image 2
conteh Avatar answered Oct 19 '22 09:10

conteh