Given
test<-"Low-Decarie, Etienne"
I wish to replace all punctuation with space
gsub(pattern="[:punct:]", x=test, replacement=" ")
but this produces
"Low-De arie, E ie e"
where no punctuation is replaced and apparently random letters are removed (though they may be associated with punctation as t for tab and n for next line).
The gsub() function in R is used for replacement operations. The functions takes the input and substitutes it against the specified values. The gsub() function always deals with regular expressions. You can use the regular expressions as the parameter of substitution.
To use special characters in a regular expression the simplest method is usually to escape them with a backslash, but as noted above, the backslash itself needs to be escaped. To match backslashes, you need to double escape, resulting in four backslashes.
A 'regular expression' is a pattern that describes a set of strings. Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE .
The regexpr() function gives you the (a) index into each string where the match begins and the (b) length of the match for that string. regexpr() only gives you the first match of the string (reading left to right). gregexpr() will give you all of the matches in a given string if there are is more than one match.
Fellow MontReal user here.
Several options, sames results.
In R Base, just double the brackets
gsub(pattern="[[:punct:]]", test, replacement=" ")
[1] "Low Decarie Etienne"
Package stringr
has function str_replace_all
that does that.
library(stringr)
str_replace_all(test, "[[:punct:]]", " ")
Or keep only letters
str_replace_all(test, "[^[:alnum:]]", " ")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With