Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do i get a filename of a file inside a gzip in java?

Tags:

java

gzip

int BUFFER_SIZE = 4096;
    byte[] buffer = new byte[BUFFER_SIZE];
    InputStream input = new GZIPInputStream(new FileInputStream("a_gunzipped_file.gz"));
    OutputStream output = new FileOutputStream("current_output_name");
    int n = input.read(buffer, 0, BUFFER_SIZE);
    while (n >= 0) {
        output.write(buffer, 0, n);
        n = input.read(buffer, 0, BUFFER_SIZE);
    }

    }catch(IOException e){
            System.out.println("error: \n\t" + e.getMessage());
    }

Using the above code I can succesfully extract a gzip's contents although the extracted file's filenames are, as expected, will always be current_output_name (I know its because I declared it to be that way in the code). My problem is I dont know how to get the file's filename when it is still inside the archive.

Though, java.util.zip provides a ZipEntry, I couldn't use it on gzip files. Any alternatives?

like image 967
lock Avatar asked Oct 21 '10 06:10

lock


People also ask

How do I use GZIPInputStream?

To use the Java GZIPInputStream you must first create a GZIPInputStream instance. Here is an example of creating a GZIPInputStream instance: InputStream fileInputStream = new FileInputStream("myfile. zip"); GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);

What is the difference between gzip and zip?

The most important difference is that gzip is only capable to compress a single file while zip compresses multiple files one by one and archives them into one single file afterwards.

What is a Gzipped file?

What is a GZ file? A GZ file is a compressed archive that is created using the standard gzip (GNU zip) compression algorithm. It may contain multiple compressed files, directories and file stubs. This format was initially developed to replace compression formats on UNIX systems.


3 Answers

as i kinda agree with "Michael Borgwardt" on his reply, but it is not entirely true, gzip file specifications contains an optional file name stored in the header of the gz file, sadly there are no way (as far as i know ) of getting that name in current java (1.6). as seen in the implementation of the GZIPInputStream in the method getHeader in the openjdk

they skip reading the file name

// Skip optional file name
if ((flg & FNAME) == FNAME) {
      while (readUByte(in) != 0) ;
}

i have modified the class GZIPInputStream to get the optional filename out of the gzip archive(im not sure if i am allowed to do that) (download the original version from here), you only need to add a member String filename; to the class, and modify the above code to be :

 // Skip optional file name
 if ((flg & FNAME) == FNAME) {
      filename= "";
      int _byte = 0;
      while ((_byte= readUByte(in)) != 0){
           filename += (char)_byte;
      }
 }

and it worked for me.

like image 131
Max Avatar answered Oct 07 '22 10:10

Max


Actually, the GZIP file format, using the multiple members, allows the original filename to be specified. Including a member with the FLAG of FLAG.FNAME the name can be specified. I do not see a way to do this in the java libraries though.

http://www.gzip.org/zlib/rfc-gzip.html#specification

like image 23
ScottS Avatar answered Oct 07 '22 11:10

ScottS


Apache Commons Compress offers two options for obtaining the filename:

With metadata (Java 7+ sample code)

try ( //
     GzipCompressorInputStream gcis = //
         new GzipCompressorInputStream( //
             new FileInputStream("a_gunzipped_file.gz") //
         ) //
    ) {
      String filename = gcis.getMetaData().getFilename();
    }

With "the convention"

 String filename = GzipUtils.getUnCompressedFilename("a_gunzipped_file.gz");

References

  • Apache Commons Compress
  • GzipCompressorInputStream
  • See also: GzipUtils#getUnCompressedFilename
like image 4
Stephan Avatar answered Oct 07 '22 10:10

Stephan