Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Scanner question

How do you set the delimiter for a scanner to either ; or new line?

I tried: Scanner.useDelimiter(Pattern.compile("(\n)|;")); But it doesn't work.

like image 415
Razvi Avatar asked Dec 30 '09 17:12

Razvi


3 Answers

As a general rule, in patterns, you need to double the \.

So, try

Scanner.useDelimiter(Pattern.compile("(\\n)|;"));`

or

Scanner.useDelimiter(Pattern.compile("[\\n;]"));`

Edit: If \r\n is the problem, you might want to try this:

Scanner.useDelimiter(Pattern.compile("[\\r\\n;]+"));

which matches one or more of \r, \n, and ;.

Note: I haven't tried these.

like image 96
Powerlord Avatar answered Sep 30 '22 15:09

Powerlord


As you've discovered, you needed to look for DOS/network style \r\n (CRLF) line separators instead of the Unix style \n (LF only). But what if the text contains both? That happens a lot; in fact, when I view the source of this very page I see both varieties.

You should get in the habit of looking for both kinds of separator, as well as the older Mac style \r (CR only). Here's one way to do that:

\r?\n|\r

Plugging that into your sample code you get:

scanner.useDelimiter(";|\r?\n|\r");

This is assuming you want to match exactly one newline or semicolon at a time. If you want to match one or more you can do this instead:

scanner.useDelimiter("[;\r\n]+");

Notice, too, how I passed in a regex string instead of a Pattern; all regexes get cached automatically, so pre-compiling the regex doesn't get you any performance gain.

like image 42
Alan Moore Avatar answered Sep 30 '22 14:09

Alan Moore


Looking at the OP's comment, it looks like it was a different line ending (\r\n or CRLF) that was the problem.

Here's my answer, which would handle multiple semicolons and line endings in either format (may or may not be desired)

Scanner.useDelimiter(Pattern.compile("([\n;]|(\r\n))+"));

e.g. an input file that looks like this:

1


2;3;;4
5

would result in 1,2,3,4,5

I tried normal \n and \\n - both worked in my case, though I agree if you need a normal backslash you would want to double it as it is an escape character. It just so happens that in this case, "\n" becomes the desired character with or without the extra '\'

like image 38
Joshua McKinnon Avatar answered Sep 30 '22 14:09

Joshua McKinnon