Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to preserve newlines while reading a file using stream - java 8

      try (Stream<String> lines = Files.lines(targetFile)) {  
     List<String> replacedContent = lines.map(line ->  
                                       StringUtils.replaceEach(line,keys, values))
                                       .parallel()
                                       .collect(Collectors.toList());
    Files.write(targetFile, replacedContent);
}

I'm trying to replace multiple text patterns in each line of the file. But I'm observing that "\r\n"(byte equivalent 10 and 13) is being replaced with just "\r"(just 10) and my comparison tests are failing.

I want to preserve the newlines as they are in the input file and don't want java to touch them. Could anyone suggest if there is a way to do this without having to use a separate default replacement for "\r\n".

like image 678
A.R.K.S Avatar asked Feb 10 '16 19:02

A.R.K.S


1 Answers

The problem is that Files.lines() is implemented on top of BufferedReader.readLine(), which reads a line up until the line terminator and throws it away. Then, when you write the lines with something like Files.write(), this supplies the system-specific line terminator after each line, which might differ from the line terminator that was read in.

If you really want to preserve the line terminators exactly as they are, even if they're a mixture of different line terminators, you could use a regex and Scanner for that.

First define a pattern that matches a line including the valid line terminators or EOF:

Pattern pat = Pattern.compile(".*\\R|.+\\z");

The \\R is a special linebreak matcher that matches the usual line terminators plus a few Unicode line terminators that I've never heard of. :-) You could use something like (\\r\\n|\\r|\\n) if you want just the usual CRLF, CR, or LF terminators.

You have to include .+\\z in order to match a potential last "line" in the file that doesn't have a line terminator. Make sure the regex always matches at least one character so that no match will be found when the Scanner reaches the end of the file.

Then, read lines using a Scanner until it returns null:

try (Scanner in = new Scanner(Paths.get(INFILE), "UTF-8")) {
    String line;
    while ((line = in.findWithinHorizon(pat, 0)) != null) {
        // Process the line, then write the output using something like
        // FileWriter.write(String) that doesn't add another line terminator.
    }
}
like image 181
Stuart Marks Avatar answered Sep 22 '22 16:09

Stuart Marks