I have some strings, for example: I: am a string, with "punctuation".
I want to split the string like:
["I", ":", "am", "a", "string", ",", "with", "\"", "punctuation", "\"", "."]
I tried text.split("[\\p{Punct}\\s]+")
but the result is I, am, a, string, with, punctuation
...
I found this solution but Java doesn't allow me to split by \w
.
Use this regex:
"\\s+|(?=\\p{Punct})|(?<=\\p{Punct})"
The result on your string:
["I", ":", "am", "a", "string", ",", "with", "", "\"", "punctuation", "\"", "."]
Unfortunately, there is an extra element, the ""
after the with. These extra elements only occur (and always occur) when there is a punctation character after a whitespace character, so this could be fixed by doing myString.replaceAll("\\s+(?=\\p{Punct})", "").split(regex);
instead of myString.split(regex);
(ie strip out the whitespace before splitting)
How this works:
\\s+
splits on a group of whitespace, so if the characters are whitespace characters, we will remove those characters and split at that location. (note: I am assuming that a string of hello world
should result in ["hello", "world"]
rather than ["hello", "", "world"]
)
(?=\\p{Punct})
is a lookahead that splits if the next character is a punctuation character, but it doesn't remove the character.(?<=\\p{Punct})
is a lookbehind that splits if the last character is a punctuation character.EDIT:
In response to your comment, this regex should allow punctuation within words:
"\\s+|(?=\\W\\p{Punct}|\\p{Punct}\\W)|(?<=\\W\\p{Punct}|\\p{Punct}\\W})"
For this one, you don't need to use the replaceAll
, simply do myString.split(regex)
.
How it works:
This regex is very similar, but the lookarounds changed. \\W\\p{Punct}
matches a non-word character followed by a punctuation character. \\p{Punct}\\W
matches a punctuation character followed by a non-word character. So each lookaround matches iff there is a punctuation character which is not in the middle of a word.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With