Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java File.listFiles() returns files that do 'not exist' according to `exists()`

I noticed this problem in our productive code:

java.lang.IllegalArgumentException: /somePath/�.png does not exist
    at org.apache.commons.io.FileUtils.sizeOf(FileUtils.java:2413)
    at org.apache.commons.io.FileUtils.sizeOfDirectory(FileUtils.java:2479)

The underlying cause is this:

import java.io.File;

public class FileNameTest
{

    public static void main(String[] args)
    {
        File[] files = new File("/somePath").listFiles();
        for (File file : files)
        {
            System.out.println(file + " - " + (file.exists() ? "exists" : "missing!!"));
        }
    }

}

Output:

0.png - exists
7.png - exists
4.png - exists
8.png - exists
1.png - exists
3.png - exists
�.png - missing!!
2.png - exists
5.png - exists
�.png - missing!!
6.png - exists
d.png - exists
$.png - exists
s.png - exists
+.png - exists
9.png - exists

The "missing" files are named with the symbols "µ" (Mu) and "€" (Euro).

It also seems to be the case that these filename use the wrong encoding. When i list the files in bash they show up wrong as well. When i convert the output of ls from latin1 to UTF-8 they appear correctly (at least mu).

But nevertheless ...

  1. these files exist
  2. file.listFiles() lists them
  3. for the 2 special cases: file.exists() returns false

I believe this is a bug in the JVM. Can anybody confirm this?

Is there already a bug-report? Any ideas how to fix this? (Renaming the files is not an option as they are user generated and might re-appear in any form or shape.)

My System:

  • Ubuntu 4.2.0
  • java version "1.8.0_102"
  • Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
  • Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
  • Apache Commons IO 2.4
like image 392
Frederic Leitenberger Avatar asked Oct 14 '16 17:10

Frederic Leitenberger


1 Answers

It is not a bug, it is a consequence of missing encoding information in the filesystem. Java has no way of representing the file name correctly, because it does not know the encoding. Therefore the file is inaccessible from Java without specifying the correct encoding.

The simplest way to solve this is to set the file.encoding property correctly, and use that encoding in all your file names.

EDIT: i found an article that shows another possible behaviour, maybe changing the file.encoding does not help. Better test it if you want to use something else than UTF-8 . http://jonisalonen.com/2012/java-and-file-names-with-invalid-characters/

i also found maybe a relevant discussion: Setting file name encoding

like image 166
Sidias-Korrado Avatar answered Oct 13 '22 00:10

Sidias-Korrado