I'm cleaning text strings in R. I want to remove all the punctuation except apostrophes and hyphens. This means I can't use the [:punct:]
character class (unless there's a way of saying [:punct:] but not '-
).
! " # $ % & ( ) * + , . / : ; < = > ? @ [ \ ] ^ _ { | } ~.
and backtick must come out.
For most of the above, escaping is not an issue. But for square brackets, I'm really having issues. Here's what I've tried:
gsub('[abc]', 'L', 'abcdef') #expected behaviour, shown as sanity check
# [1] "LLLdef"
gsub('[[]]', 'B', 'it[]') #only 1 substitution, ie [] treated as a single character
# [1] "itB"
gsub('[\[\]]', 'B', 'it[]') #single escape, errors as expected
Error: '[' is an unrecognized escape in character string starting "'[["
gsub('[\\[\\]]', 'B', 'it[]') #double escape, single substitution
# [1] "itB"
gsub('[\\]\\[]', 'B', 'it[]') #double escape, reversed order, NO substitution
# [1] "it[]"
I'd prefer not to used fixed=TRUE
with gsub
since that will prevent me from using a character class. So, how do I include square brackets in a regex character class?
ETA additional trials:
gsub('[[\\]]', 'B', 'it[]') #double escape on closing ] only, single substitution
# [1] "itB"
gsub('[[\]]', 'B', 'it[]') #single escape on closing ] only, expected error
Error: ']' is an unrecognized escape in character string starting "'[[]"
ETA: the single substitution was caused by not setting perl=T
in my gsub
calls. ie:
gsub('[[\\]]', 'B', 'it[]', perl=T)
You can use [:punct:]
, when you combine it with a negative lookahead
(?!['-])[[:punct:]]
This way a [:punct:]
is only matched, if it is not in ['-]
. The negative lookahead assertion (?!['-])
ensures this condition. It failes when the next character is a '
or a -
and then the complete expression fails.
Inside a character class you only need to escape the closing square bracket:
Try using '[[\\]]'
or '[[\]]'
(I am not sure about escaping the backslash as I don't know R.)
See this example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With