Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java utility library for Nested ZIP file handling

I am aware that Oracle notes ZIP/GZIP file compressor/decompressor methods on their website. But I have a scenario where I need to scan and find out whether any nested ZIPs/RARs are involved. For example, the following case:

-MyFiles.zip
   -MyNestedFiles.zip
        -MyMoreNestedFiles.zip
           -MoreProbably.zip
        -Other_non_zips
   -Other_non_zips
-Other_non_zips

I know that apache commons compress package and java.util.zip are the wideley used packages where commons compress actually caters for the missing features in java.util.zip e.g. some character setting whilst doing zipouts. But what I am not sure about is the utilities for recursing through nested zip files and the answers provided on SO are not very good examples of doing this. I tried the following code (which I got from Oracle blog), but as I suspected, the nested directory recursion fails because it simply cannot find the files:

public static void processZipFiles(String pathName) throws Exception{
        ZipInputStream zis  = null;
        InputStream  is = null;
        try {
          ZipFile zipFile = new ZipFile(new File(pathName));
          String nestPathPrefix = zipFile.getName().substring(0, zipFile.getName().length() -4);
          for(Enumeration e = zipFile.entries(); e.hasMoreElements();){
           ZipEntry ze = (ZipEntry)e.nextElement();
            if(ze.getName().contains(".zip")){
              is = zipFile.getInputStream(ze);
              zis = new ZipInputStream(is);
              ZipEntry zentry = zis.getNextEntry();

              while (zentry!=null){
                  System.out.println(zentry.getName());
                  zentry = zis.getNextEntry();
                  ZipFile nestFile = new ZipFile(nestPathPrefix+"\\"+zentry.getName());
                  if (zentry.getName().contains(".zip")) {
                      processZipFiles(nestPathPrefix+"\\"+zentry.getName());
                  }
              }
              is.close();
            }
          }
        } catch (FileNotFoundException e) {
          e.printStackTrace();
        } catch (IOException e) {
          e.printStackTrace();
        } finally{
            if(is != null)
                is.close();
            if(zis!=null)
                zis.close();
        }
    }  

May be I am doing something wrong - or using the wrong utils. My objective is to identify whether any of the files or nested zip files have got file extensions which I am not allowing. This is to make sure that I can prevent my users to upload forbidden files even when they are zipping it. I also have the option to use Tika which can do recursive parsing (Using Zukka Zitting's solution), but I am not sure if I can use the Metadata to do this detection how I want.

Any help/suggestion is appreciated.

like image 625
ha9u63ar Avatar asked Feb 11 '16 10:02

ha9u63ar


1 Answers

Using Commons Compress would be easier, not least because it has sensible shared interfaces between the various decompressors which make life easier + allows handling of other compression formats (eg Tar) at the same time

If you do want to use only the built-in Zip support, I'd suggest you do something like this:

File file = new File("outermost.zip");
FileInputStream input = new FileInputStream(file);
check(input, file.toString());

public static void check(InputStream compressedInput, String name) {
   ZipInputStream input = new ZipInputStream(compressedInput);
   ZipEntry entry = null;
   while ( (entry = input.getNextEntry()) != null ) {
      System.out.println("Found " + entry.getName() + " in " + name);
      if (entry.getName().endsWith(".zip")) { // TODO Better checking
         check(input, name + "/" + entry.getName());
      }
   }
}

Your code will fail as you're trying to read inner.zip within outer.zip as a local file, but it doesn't exist as a standalone file. The code above will process things ending with .zip as another zip file, and will recurse

You probably want to use commons compress though, so you can handle things with alternate filenames, other compression formats etc

like image 78
Gagravarr Avatar answered Sep 20 '22 19:09

Gagravarr