Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MalformedInputException with Files.readAllLines()

Tags:

java

file

java-8

I was iterating over some files, 5328 to be precise. These files are average XML files with 60-200 lines max. They are first filtered through a simple method isXmlSourceFile that parse the path.

    Files.walk(Paths.get("/home/me/development/projects/myproject"), FileVisitOption.FOLLOW_LINKS)
            .filter(V3TestsGenerator::isXmlTestSourceFile)
            .filter(V3TestsGenerator::fileContainsXmlTag)

The big question is for the second filter, especially the method fileContainsXmlTag. For each file I wanted to detect if a pattern was contained at least once among the lines of it:

private static boolean fileContainsXmlTag(Path path) {
    try {
        return Files.readAllLines(path).stream().anyMatch(line -> PATTERN.matcher(line).find());
    } catch (IOException e) {
        e.printStackTrace();
    }
    return false;
}

For some files I get then this exception

java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at java.nio.file.Files.readAllLines(Files.java:3205)
at java.nio.file.Files.readAllLines(Files.java:3242)

But when I use FileUtiles.readLines() instead of Files.readAllLines everything is getting well.

It's a curiosity question so if someone as a clue of what's going on, it's with pleasure.

Thanks

like image 547
Cousnouf Avatar asked Aug 08 '16 12:08

Cousnouf


1 Answers

The method Files.readAllLines() assumes that the file you are reading is encoded in UTF-8.

If you get this exception, then the file you are reading is most likely encoded using a different character encoding than UTF-8.

Find out what character encoding is used, and use the other readAllLines method, that allows you to specify the character encoding.

For example, if the files are encoded in ISO-8859-1:

return Files.readAllLines(path, StandardCharsets.ISO_8859_1).stream()... // etc.

The method FileUtiles.readLines() (where does that come from?) probably assumes something else (it probably assumes the files are in the default character encoding of your system, which is something else than UTF-8).

like image 157
Jesper Avatar answered Nov 10 '22 08:11

Jesper