I need to split a string (in Java) with punctuation marks being stored in the same array as words:
String sentence = "In the preceding examples, classes derived from...";
String[] split = sentence.split(" ");
I need split array to be:
split[0] - "In"
split[1] - "the"
split[2] - "preceding"
split[3] - "examples"
split[4] - ","
split[5] - "classes"
split[6] - "derived"
split[7] - "from"
split[8] - "..."
Is there any elegant solution?
You need a look arounds:
String[] split = sentence.split(" ?(?<!\\G)((?<=[^\\p{Punct}])(?=\\p{Punct})|\\b) ?");
Look arounds assert, but (importantly here) don't consume the input when matching.
Some test code:
String sentence = "Foo bar, baz! Who? Me...";
String[] split = sentence.split(" ?(?<!\\G)((?<=[^\\p{Punct}])(?=\\p{Punct})|\\b) ?");
Arrays.stream(split).forEach(System.out::println);
Output;
Foo
bar
,
baz
!
Who
?
Me
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With