I was iterating over some files, 5328 to be precise. These files are average XML files with 60-200 lines max. They are first filtered through a simple method isXmlSourceFile that parse the path.
Files.walk(Paths.get("/home/me/development/projects/myproject"), FileVisitOption.FOLLOW_LINKS)
.filter(V3TestsGenerator::isXmlTestSourceFile)
.filter(V3TestsGenerator::fileContainsXmlTag)
The big question is for the second filter, especially the method fileContainsXmlTag. For each file I wanted to detect if a pattern was contained at least once among the lines of it:
private static boolean fileContainsXmlTag(Path path) {
try {
return Files.readAllLines(path).stream().anyMatch(line -> PATTERN.matcher(line).find());
} catch (IOException e) {
e.printStackTrace();
}
return false;
}
For some files I get then this exception
java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at java.nio.file.Files.readAllLines(Files.java:3205)
at java.nio.file.Files.readAllLines(Files.java:3242)
But when I use FileUtiles.readLines() instead of Files.readAllLines everything is getting well.
It's a curiosity question so if someone as a clue of what's going on, it's with pleasure.
Thanks
The method Files.readAllLines()
assumes that the file you are reading is encoded in UTF-8.
If you get this exception, then the file you are reading is most likely encoded using a different character encoding than UTF-8.
Find out what character encoding is used, and use the other readAllLines
method, that allows you to specify the character encoding.
For example, if the files are encoded in ISO-8859-1:
return Files.readAllLines(path, StandardCharsets.ISO_8859_1).stream()... // etc.
The method FileUtiles.readLines()
(where does that come from?) probably assumes something else (it probably assumes the files are in the default character encoding of your system, which is something else than UTF-8).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With