I'm running into a problem using the commons compress library to create a tar.gz of a directory. I have a directory structure that is as follows.
parent/
child/
file1.raw
fileN.raw
I'm using the following code to do the compression. It runs fine without exceptions. However, when I try to decompress that tar.gz, I get a single file with the name "childDirToCompress". Its the correct size so the files have clearly been appended to each other in the tarring process. The desired output would be a directory. I can't figure out what I'm doing wrong. Can any wise commons compresser set me upon the correct path?
CreateTarGZ() throws CompressorException, FileNotFoundException, ArchiveException, IOException {
File f = new File("parent");
File f2 = new File("parent/childDirToCompress");
File outFile = new File(f2.getAbsolutePath() + ".tar.gz");
if(!outFile.exists()){
outFile.createNewFile();
}
FileOutputStream fos = new FileOutputStream(outFile);
TarArchiveOutputStream taos = new TarArchiveOutputStream(new GZIPOutputStream(new BufferedOutputStream(fos)));
taos.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_STAR);
taos.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
addFilesToCompression(taos, f2, ".");
taos.close();
}
private static void addFilesToCompression(TarArchiveOutputStream taos, File file, String dir) throws IOException{
taos.putArchiveEntry(new TarArchiveEntry(file, dir));
if (file.isFile()) {
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
IOUtils.copy(bis, taos);
taos.closeArchiveEntry();
bis.close();
}
else if(file.isDirectory()) {
taos.closeArchiveEntry();
for (File childFile : file.listFiles()) {
addFilesToCompression(taos, childFile, file.getName());
}
}
}
A new class TarFile provides random access to TAR archives. Commons Compress now ships with a copy of the Pack200 code of the retired Apache Harmony project. The pack200 support in Commons Compress no longer uses the implementation of the Java class library - and thus also works for Java 14 and later.
The trick is that 7-Zip will only gzip a single file. So creating a tar. gz is a two step process. First create the tar archive, then use 7-Zip to select the tar and you will get an option to gzip it.
Apache Commons Compress software defines an API for working with compression and archive formats.
I followed this solution and it worked until I was processing a larger set of files and it randomly crashes after processing 15000 - 16000 files. the following line is leaking file handlers:
IOUtils.copy(new FileInputStream(f), tOut);
and the code crashed with a "Too many open files" error at the OS level The following minor change fix the problem:
FileInputStream in = new FileInputStream(f);
IOUtils.copy(in, tOut);
in.close();
I haven't figured out what exactly was going wrong but a scouring of google caches I found a working example. Sorry for the tumbleweed!
public void CreateTarGZ()
throws FileNotFoundException, IOException
{
try {
System.out.println(new File(".").getAbsolutePath());
dirPath = "parent/childDirToCompress/";
tarGzPath = "archive.tar.gz";
fOut = new FileOutputStream(new File(tarGzPath));
bOut = new BufferedOutputStream(fOut);
gzOut = new GzipCompressorOutputStream(bOut);
tOut = new TarArchiveOutputStream(gzOut);
addFileToTarGz(tOut, dirPath, "");
} finally {
tOut.finish();
tOut.close();
gzOut.close();
bOut.close();
fOut.close();
}
}
private void addFileToTarGz(TarArchiveOutputStream tOut, String path, String base)
throws IOException
{
File f = new File(path);
System.out.println(f.exists());
String entryName = base + f.getName();
TarArchiveEntry tarEntry = new TarArchiveEntry(f, entryName);
tOut.putArchiveEntry(tarEntry);
if (f.isFile()) {
IOUtils.copy(new FileInputStream(f), tOut);
tOut.closeArchiveEntry();
} else {
tOut.closeArchiveEntry();
File[] children = f.listFiles();
if (children != null) {
for (File child : children) {
System.out.println(child.getName());
addFileToTarGz(tOut, child.getAbsolutePath(), entryName + "/");
}
}
}
}
I ended up doing the following:
public URL createTarGzip() throws IOException {
Path inputDirectoryPath = ...
File outputFile = new File("/path/to/filename.tar.gz");
try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile);
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
GzipCompressorOutputStream gzipOutputStream = new GzipCompressorOutputStream(bufferedOutputStream);
TarArchiveOutputStream tarArchiveOutputStream = new TarArchiveOutputStream(gzipOutputStream)) {
tarArchiveOutputStream.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_POSIX);
tarArchiveOutputStream.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
List<File> files = new ArrayList<>(FileUtils.listFiles(
inputDirectoryPath,
new RegexFileFilter("^(.*?)"),
DirectoryFileFilter.DIRECTORY
));
for (int i = 0; i < files.size(); i++) {
File currentFile = files.get(i);
String relativeFilePath = new File(inputDirectoryPath.toUri()).toURI().relativize(
new File(currentFile.getAbsolutePath()).toURI()).getPath();
TarArchiveEntry tarEntry = new TarArchiveEntry(currentFile, relativeFilePath);
tarEntry.setSize(currentFile.length());
tarArchiveOutputStream.putArchiveEntry(tarEntry);
tarArchiveOutputStream.write(IOUtils.toByteArray(new FileInputStream(currentFile)));
tarArchiveOutputStream.closeArchiveEntry();
}
tarArchiveOutputStream.close();
return outputFile.toURI().toURL();
}
}
This takes care of the some of the edge cases that come up in the other solutions.
I had to make some adjustments to @merrick solution to get it to work related to the path. Perhaps with the latest maven dependencies. The currently accepted solution didn't work for me.
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.io.filefilter.DirectoryFileFilter;
import org.apache.commons.io.filefilter.RegexFileFilter;
public class TAR {
public static void CreateTarGZ(String inputDirectoryPath, String outputPath) throws IOException {
File inputFile = new File(inputDirectoryPath);
File outputFile = new File(outputPath);
try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile);
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
GzipCompressorOutputStream gzipOutputStream = new GzipCompressorOutputStream(bufferedOutputStream);
TarArchiveOutputStream tarArchiveOutputStream = new TarArchiveOutputStream(gzipOutputStream)) {
tarArchiveOutputStream.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_POSIX);
tarArchiveOutputStream.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
List<File> files = new ArrayList<>(FileUtils.listFiles(
inputFile,
new RegexFileFilter("^(.*?)"),
DirectoryFileFilter.DIRECTORY
));
for (int i = 0; i < files.size(); i++) {
File currentFile = files.get(i);
String relativeFilePath = inputFile.toURI().relativize(
new File(currentFile.getAbsolutePath()).toURI()).getPath();
TarArchiveEntry tarEntry = new TarArchiveEntry(currentFile, relativeFilePath);
tarEntry.setSize(currentFile.length());
tarArchiveOutputStream.putArchiveEntry(tarEntry);
tarArchiveOutputStream.write(IOUtils.toByteArray(new FileInputStream(currentFile)));
tarArchiveOutputStream.closeArchiveEntry();
}
tarArchiveOutputStream.close();
}
}
}
Maven
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.18</version>
</dependency>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With