I have a text and using this simple regex to split it in words: [ \n]
. It splits the text into words using spaces and line-breaks.
I want to know if there is a way to keep the whitespace or the line-break in the splited word, because I will use this to a simple sentence detection after some processing.
I'm using the String#split
method.
You can use lookbehind as @Piotr Findeisen suggested (+1):
public class RegexExample{
public static void main(String[] args) {
String s = "firstWordWithSpaceAfter secondWordWithSpaceAfter wordWithLineBreakAfter\nlastWord";
String sa[] = s.split("(?<=[ \\n])");
for (String saa : sa )
System.out.println("[" + saa + "]");
}
}
Output:
[firstWordWithSpaceAfter ]
[secondWordWithSpaceAfter ]
[wordWithLineBreakAfter
]
[lastWord]
Short explanation:
?<=
is look behind, meaning you got a match if the data before the expression you are looking for is equal to the regex coming after ?<=
(in this case [ \\n]
)
[ \\n]
is regex that means one of the characters in the []
so the whole regex says split every time that the character before the expression / word is either space or \n
.
Since we didn't try to match space or \n
, it will not remove them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With