I have asked related questions HERE and HERE. I tried to generalize these answers but have failed.
Basically I have a string I want to split into words, numbers and any sort of punctuation, yet, I want to retain the apostrophes. Here is what I've tried and I'm so close (I think):
x <- "Raptors don't like robots! I'd pay $500.00 to rid them."
strsplit(x, "(\\s+)|(?=[[:punct:]])", perl = TRUE)
## [[1]]
## [1] "Raptors" "don" "'" "t" "like" "robots" "!"
## [8] "" "I" "'" "d" "pay" "$" "500" "." "00" "to"
## [20] "rid" "them" "."
Here's what I'm after:
## [[1]]
## [1] "Raptors" "don't" "like" "robots" "!" "" "I'd"
## [8] "pay" "$" "500" "." "00" "to" "rid" "them" "."
While I want a base solution I would like to see other solutions (I'm sure someone has a stringr solution) which makes the question more generalizable to others.
Note: R has a specific regex system. You will want to be familiar with R to answer this question.
You could use a negative lookahead (?!')
:
strsplit(x, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)
# [1] "Raptors" "don't" "like" "robots" "!" "" "I'd" "pay" "$" "500" "." "00" "to" "rid" "them" "."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With