Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Scanner Dilimiter

I'm using Scanner and a Delimiter to tokenize my .txt file (it's a homework that I've got to do). First version of the file looks like this:

5,5,5,6,5,8,9,5,6,8, good, very good, excellent, good
7,7,8,7,6,7,8,8,9,7,very good, Good, excellent, very good
8,7,6,7,8,7,5,6,8,7 ,GOOD, VERY GOOD, GOOD, AVERAGE
9,9,9,8,9,7,9,8,9,9 ,Excellent, very good, very good, excellent
7,8,8,7,8,7,8,9,6,8 ,very good, good, excellent, excellent
6,5,6,4,5,6,5,6,6,6 ,good, average, good, good
7,8,7,7,6,8,7,8,6,6 ,good, very good, good,  very good
5,7,6,7,6,7,6,7,7,7  ,excellent, very good, very good, very good

And I've used useDelimiter("[ ]*(,)[ ]*") second version of the file looks like this:

5 5 5 6 5 8 9 5 6 8 good, very good, excellent, good
7 7 8 7 6 7 8 8 9 7 very good, Good, excellent, very good
8 7 6 7 8 7  5 6 8 7 GOOD, VERY GOOD, GOOD, AVERAGE
9 9 9 8 9 7 9  8 9 9 Excellent, very good, very good, excellent
7 8 8 7 8 7 8 9 6 8 very good, good, excellent, excellent
6 5 6 4 5 6 5 6 6 6 good, average, good, good
7  8 7 7 6 8 7 8 6 6 good, very good, good,  very good
5 7 6 7 6 7 6 7 7 7  excellent, very good, very good, very good

And I can't come up with a regexp which would help me to separate numbers by space and words by comma. Esentially I need an array with 14 values (very good being a single variable)

Note there are multiple spaces (this is done on purpose to make it harder for us).

So any sort of help would be appreciated.

P.S. We're only allowed to use Delimiters only (no splits etc..)

like image 434
Deividas Sutkus Avatar asked Feb 21 '13 16:02

Deividas Sutkus


2 Answers

This should work, the key is the positive-lookbehind ((<?=)) and alternation (|):

String input = "9 9 9 8 9 7 9  8 9 9 Excellent, very good, very good, excellent";
Scanner s = new Scanner(input).useDelimiter("(?<=\\d)[\\s,]+|\\s*,\\s*");
while (s.hasNext()) {
    System.out.println("Token: ." + s.next() + ".");
}

Prints:

Token: .9.
Token: .9.
Token: .9.
Token: .8.
Token: .9.
Token: .7.
Token: .9.
Token: .8.
Token: .9.
Token: .9.
Token: .Excellent.
Token: .very good.
Token: .very good.
Token: .excellent.
like image 75
ach Avatar answered Sep 21 '22 17:09

ach


You can try this one (((?<=[0-9]+)\s*(?=[0-9]+))|(,\s*(?=[a-zA-Z]+))|((?<=[0-9]+)\s*(?=[a-zA-Z]+))), looks awful but should work

like image 36
Alexey A. Avatar answered Sep 18 '22 17:09

Alexey A.