I'm able to remove all punctuation from a string while keeping apostrophes, but I'm now stuck on how to remove any apostrophes that are not between two letters.
str1 <- "I don't know 'how' to remove these ' things"
Should look like this:
"I don't know how to remove these things"
You may use a regex approach:
str1 <- "I don't know 'how' to remove these ' things"
gsub("\\s*'\\B|\\B'\\s*", "", str1)
See this IDEONE demo and a regex demo.
The regex matches:
\\s*'\\B - 0+ whitespaces, ' and a non-word boundary| - or\\B'\\s* - a non-word boundary, ' and 0+ whitespacesIf you do not need to care about the extra whitespace that can remain after removing standalone ', you can use a PCRE regex like
\b'\b(*SKIP)(*F)|'
See the regex demo
Explanation:
\b'\b - match a ' in-between word characters(*SKIP)(*F) - and omit the match| - or match...' - an apostrophe in another context.See an IDEONE demo:
gsub("\\b'\\b(*SKIP)(*F)|'", "", str1, perl=TRUE)
To account for apostrophes in-between Unicode letters, add (*UTF)(*UCP) flags at the start of the pattern and use a perl=TRUE argument:
gsub("(*UTF)(*UCP)\\s*'\\B|\\B'\\s*", "", str1, perl=TRUE)
^^^^^^^^^^^^ ^^^^^^^^^
Or
gsub("(*UTF)(*UCP)\\b'\\b(*SKIP)(*F)|'", "", str1, perl=TRUE)
^^^^^^^^^^^^
See another IDEONE demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With