Say I have a string for example the following.
x <- 'The world is at end. What do you think? I am going crazy! These people are too calm.'
I need to split only on the punctuation !?.
and following whitespace and keep the punctuation with it.
This removes the punctuation and leaves leading spaces in the split parts though
vec <- strsplit(x, '[!?.][:space:]*')
How can I split sentences leaving the punctuation?
You can switch on PCRE
by using perl=TRUE
and use a lookbehind assertion.
strsplit(x, '(?<![^!?.])\\s+', perl=TRUE)
Regular expression:
(?<! look behind to see if there is not:
[^!?.] any character except: '!', '?', '.'
) end of look-behind
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times)
Live Demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With