Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, how to use regex [:punct:] in gsub?

Tags:

regex

r

Given

test<-"Low-Decarie, Etienne"

I wish to replace all punctuation with space

gsub(pattern="[:punct:]", x=test, replacement=" ")

but this produces

"Low-De arie, E ie  e"

where no punctuation is replaced and apparently random letters are removed (though they may be associated with punctation as t for tab and n for next line).

like image 289
Etienne Low-Décarie Avatar asked May 24 '12 14:05

Etienne Low-Décarie


People also ask

How does GSUB work in R?

The gsub() function in R is used for replacement operations. The functions takes the input and substitutes it against the specified values. The gsub() function always deals with regular expressions. You can use the regular expressions as the parameter of substitution.

How do you match special characters in regex R?

To use special characters in a regular expression the simplest method is usually to escape them with a backslash, but as noted above, the backslash itself needs to be escaped. To match backslashes, you need to double escape, resulting in four backslashes.

Can I use regex in R?

A 'regular expression' is a pattern that describes a set of strings. Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE .

Which function we use to match pattern in regular expression in R?

The regexpr() function gives you the (a) index into each string where the match begins and the (b) length of the match for that string. regexpr() only gives you the first match of the string (reading left to right). gregexpr() will give you all of the matches in a given string if there are is more than one match.


1 Answers

Fellow MontReal user here.

Several options, sames results.

In R Base, just double the brackets

gsub(pattern="[[:punct:]]", test, replacement=" ")

[1] "Low Decarie  Etienne"

Package stringr has function str_replace_all that does that.

library(stringr)
str_replace_all(test, "[[:punct:]]", " ")

Or keep only letters

str_replace_all(test, "[^[:alnum:]]", " ")
like image 155
Pierre Lapointe Avatar answered Nov 08 '22 02:11

Pierre Lapointe