Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improper zip entries when writing filename containing non-English characters even with Java 7

I am trying to develop code that can handle zipping files with non-English characters (Umlaut, Arabic etc) but the zipped file contains improper names. I am using java version 1.7.0_45 thus it shouldn't be due to the bug mentioned here.I am setting the charset to UTF-8 for the ZipOutputStream constructor and as per Javadocs it should work as per my requirements.

I am assured that the zip file is being written correctly as an attempt to read entries from the file gives proper filenames (as expected).

However, when I try to open/extract with either Ubuntu default ArchiveManager/Unzip tool, the filenames are messed up.

Here is my code :

private void convertFilesToZip(List<File> files) {
    FileInputStream inputStream = null;
    try {
        byte[] buffer = new byte[1024];

        FileOutputStream fileOutputStream = new FileOutputStream("zipFile.zip");

        ZipOutputStream outputStream = new ZipOutputStream(fileOutputStream, Charset.forName("UTF-8"));

        for (File file : files) {
            inputStream = new FileInputStream(file);
            String filename = file.getName();
            System.out.println("Adding file : " + filename);
            outputStream.putNextEntry(new ZipEntry(filename));

            int length;

            while ((length = inputStream.read(buffer)) > 0) {
                outputStream.write(buffer, 0, length);
            }
            outputStream.closeEntry();
        }

        if(inputStream != null) inputStream.close();
        outputStream.close();
        System.out.println("Zip created successfully");
        System.out.println("=======================================================");
        System.out.println("Reading zip Entries");
        ZipInputStream zipInputStream = new ZipInputStream(new FileInputStream(new File("zipFile.zip")), Charset.forName("UTF-8"));
        ZipEntry zipEntry;
        while((zipEntry=zipInputStream.getNextEntry())!=null){
            System.out.println(zipEntry.getName());
            zipInputStream.closeEntry();
        }

        zipInputStream.close();
    } catch (IOException exception) {
        exception.printStackTrace();
    }
}

the output for the files is as follows:

Adding file : umlaut_ḧ.txt
Adding file : ذ ر ز س ش ص ض.txt
Adding file : äǟc̈ḧös̈ ẗǚẍŸ_uploadFile4.txt
Adding file : pingüino.txt
Adding file : ÄÖÜäöüß- Español  deEspaña.ppt
Zip created successfully
=======================================================
Reading zip Entries
umlaut_ḧ.txt
ذ ر ز س ش ص ض.txt
äǟc̈ḧös̈ ẗǚẍŸ_uploadFile4.txt
pingüino.txt
ÄÖÜäöüß- Español  deEspaña.ppt

Has anyone successfully implemented what I wish to achieve here. Can someone point me to what I may have missed or have been doing wrong.I did all the google I could and even tried Apache Commons Compress but still no luck.

It's mentioned in the bug report that the bug is resolved in Java 7, then why is the code not working.

like image 513
Raj Saxena Avatar asked Jan 07 '14 14:01

Raj Saxena


1 Answers

[Update] I finally figured out that the problem is not in the code but is in fact with the default ArchiveManager of Ubuntu. It doesn't recognizes/extracts the contents properly. When the same file is opened/extracted by the windows zip handler, it works flawlessly.

Additionally, the commons-compress supports a bunch of other formats too apart from the zip,gzip supported by Java.

http://commons.apache.org/proper/commons-compress/index.html

like image 88
Raj Saxena Avatar answered Oct 28 '22 19:10

Raj Saxena