Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract tar.gz file in memory in Java

I'm using the Apache Compress library to read a .tar.gz file, something like this:

    final TarArchiveInputStream tarIn = initializeTarArchiveStream(this.archiveFile);
    try {
        TarArchiveEntry tarEntry = tarIn.getNextTarEntry();
        while (tarEntry != null) {
            byte[] btoRead = new byte[1024];
            BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!
            int len = 0;
            while ((len = tarIn.read(btoRead)) != -1) {
                bout.write(btoRead, 0, len);
            }
            bout.close();
            tarEntry = tarIn.getNextTarEntry();
        }
        tarIn.close();
    }
    catch (IOException e) {
        e.printStackTrace();
    }

Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?

like image 988
Amir Afghani Avatar asked Feb 15 '14 02:02

Amir Afghani


2 Answers

You could replace the file stream with a ByteArrayOutputStream.

i.e. replace this:

BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!

with this:

ByteArrayOutputStream bout = new ByteArrayOutputStream();

and then after closing bout, use bout.toByteArray() to get the bytes.

like image 179
user253751 Avatar answered Sep 28 '22 13:09

user253751


Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?

Yea sure.

Just replace the code in the inner loop that is openning files and writing to them with code that writes to a ByteArrayOutputStream ... or a series of such streams.

The natural representation of the data that you read from the TAR (like that) will be bytes / byte arrays. If the bytes are properly encoded characters, and you know the correct encoding, then you can convert them to strings. Otherwise, it is better to leave the data as bytes. (If you attempt to convert non-text data to strings, or if you convert using the wrong charset/encoding you are liable to mangle it ... irreversibly.)

Obviously, you are going to need to think through some of these issues yourself, but basic idea should work ... provided you have enough heap space.

like image 23
Stephen C Avatar answered Sep 28 '22 13:09

Stephen C