Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string, including punctuation marks?

Tags:

java

I need to split a string (in Java) with punctuation marks being stored in the same array as words:

String sentence = "In the preceding examples, classes derived from...";
String[] split = sentence.split(" ");

I need split array to be:

split[0] - "In"
split[1] - "the"
split[2] - "preceding"
split[3] - "examples"
split[4] - ","
split[5] - "classes"
split[6] - "derived"
split[7] - "from"
split[8] - "..."

Is there any elegant solution?

like image 280
storojs72 Avatar asked Apr 25 '15 22:04

storojs72


1 Answers

You need a look arounds:

String[] split = sentence.split(" ?(?<!\\G)((?<=[^\\p{Punct}])(?=\\p{Punct})|\\b) ?");

Look arounds assert, but (importantly here) don't consume the input when matching.


Some test code:

String sentence = "Foo bar, baz! Who? Me...";
String[] split = sentence.split(" ?(?<!\\G)((?<=[^\\p{Punct}])(?=\\p{Punct})|\\b) ?");
Arrays.stream(split).forEach(System.out::println);

Output;

Foo
bar
,
baz
!
Who
?
Me
...
like image 189
Bohemian Avatar answered Sep 27 '22 18:09

Bohemian