I am aware that Oracle notes ZIP/GZIP file compressor/decompressor methods on their website. But I have a scenario where I need to scan and find out whether any nested ZIPs/RARs are involved. For example, the following case:
-MyFiles.zip
-MyNestedFiles.zip
-MyMoreNestedFiles.zip
-MoreProbably.zip
-Other_non_zips
-Other_non_zips
-Other_non_zips
I know that apache commons compress package and java.util.zip are the wideley used packages where commons compress actually caters for the missing features in java.util.zip e.g. some character setting whilst doing zipouts. But what I am not sure about is the utilities for recursing through nested zip files and the answers provided on SO are not very good examples of doing this. I tried the following code (which I got from Oracle blog), but as I suspected, the nested directory recursion fails because it simply cannot find the files:
public static void processZipFiles(String pathName) throws Exception{
ZipInputStream zis = null;
InputStream is = null;
try {
ZipFile zipFile = new ZipFile(new File(pathName));
String nestPathPrefix = zipFile.getName().substring(0, zipFile.getName().length() -4);
for(Enumeration e = zipFile.entries(); e.hasMoreElements();){
ZipEntry ze = (ZipEntry)e.nextElement();
if(ze.getName().contains(".zip")){
is = zipFile.getInputStream(ze);
zis = new ZipInputStream(is);
ZipEntry zentry = zis.getNextEntry();
while (zentry!=null){
System.out.println(zentry.getName());
zentry = zis.getNextEntry();
ZipFile nestFile = new ZipFile(nestPathPrefix+"\\"+zentry.getName());
if (zentry.getName().contains(".zip")) {
processZipFiles(nestPathPrefix+"\\"+zentry.getName());
}
}
is.close();
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally{
if(is != null)
is.close();
if(zis!=null)
zis.close();
}
}
May be I am doing something wrong - or using the wrong utils. My objective is to identify whether any of the files or nested zip files have got file extensions which I am not allowing. This is to make sure that I can prevent my users to upload forbidden files even when they are zipping it. I also have the option to use Tika which can do recursive parsing (Using Zukka Zitting's solution), but I am not sure if I can use the Metadata to do this detection how I want.
Any help/suggestion is appreciated.
Using Commons Compress would be easier, not least because it has sensible shared interfaces between the various decompressors which make life easier + allows handling of other compression formats (eg Tar) at the same time
If you do want to use only the built-in Zip support, I'd suggest you do something like this:
File file = new File("outermost.zip");
FileInputStream input = new FileInputStream(file);
check(input, file.toString());
public static void check(InputStream compressedInput, String name) {
ZipInputStream input = new ZipInputStream(compressedInput);
ZipEntry entry = null;
while ( (entry = input.getNextEntry()) != null ) {
System.out.println("Found " + entry.getName() + " in " + name);
if (entry.getName().endsWith(".zip")) { // TODO Better checking
check(input, name + "/" + entry.getName());
}
}
}
Your code will fail as you're trying to read inner.zip
within outer.zip
as a local file, but it doesn't exist as a standalone file. The code above will process things ending with .zip
as another zip file, and will recurse
You probably want to use commons compress though, so you can handle things with alternate filenames, other compression formats etc
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With