Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Scanner.nextLine() consumes newline character

Tags:

java

regex

I have a scanner set up that is working on an InputStream.

I am using Scanner.nextLine() to advance to each line, then doing some regular expression work on each line.

I have a regular expression that is basically like [\w\p{Z}]+?[;\n\r] to pick up anything to the end of that line, or just ONE thing, if they are semi-colon delimited.

so if my InpustStream looks like

abcd;
xyz

It will pick up abcd;, but not xyz.

I think this is because scanner is consuming the newline character at the end of the line of text must be getting consumed somehow when the .nextLine() function is being called. Can anyone tell me how to fix this problem?

As an additional point of info, for my regex, i am compiling the pattern with Pattern.DOTALL

Thanks!

like image 862
Derek Avatar asked Dec 21 '22 02:12

Derek


2 Answers

Actually, you're the one that's causing the problem, by trying to consume a newline at the end of the last line. :-/ It's perfectly valid for the last line to end abruptly without a newline character, but your regex requires it to have one. You might be able to fix that by replacing the newline with an anchor or a lookahead, but there are much easier ways to go about this.

One is to override the default delimiter and iterate over the fields with next():

Scanner sc1 = new Scanner("abcd;\nxyz");
sc1.useDelimiter("[;\r\n]+");
while (sc1.hasNext())
{
  System.out.printf("%s%n", sc1.next());
}

The other is to iterate over the lines with nextLine() (using the default delimiter) and then split each line on semicolons:

Scanner sc2 = new Scanner("abcd;\nxyz");
while (sc2.hasNextLine())
for (String item : sc2.nextLine().split(";"))
{
  System.out.printf("%s%n", item);
}

Scanner's API is one of the most bloated and unintuitive I've ever worked with, but you can greatly reduce the pain of using it if you remember these two crucial points:

  1. Think in terms of matching the delimiters, not the fields (like you do with String's split()).
  2. Never call one of the nextXXX() methods without first calling the corresponding hasNextXXX() method.
like image 152
Alan Moore Avatar answered Dec 24 '22 02:12

Alan Moore


So, why don't you add a newline to your nextLine() result?

Isn't there a Regex-Special-Character ^ or $ that stands for the strings bounds?

like image 33
user1025189 Avatar answered Dec 24 '22 01:12

user1025189