Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern matching and replacement in R

Tags:

regex

r

gsub

I am not familiar at all with regular expressions, and would like to do pattern matching and replacement in R.

I would like to replace the pattern #1, #2 in the vector: original = c("#1", "#2", "#10", "#11") with each value of the vector vec = c(1,2).

The result I am looking for is the following vector: c("1", "2", "#10", "#11") I am not sure how to do that. I tried doing:

for(i in 1:2) {
    pattern = paste("#", i, sep = "")
    original = gsub(pattern, vec[i], original, fixed = TRUE)
}

but I get :

#> original
#[1] "1"  "2"  "10" "11"

instead of: "1" "2" "#10" "#11"

I would appreciate any help I can get! Thank you!

like image 987
Mayou Avatar asked Nov 26 '13 14:11

Mayou


2 Answers

Specify that you are matching the entire string from start (^) to end ($).

Here, I've matched exactly the conditions you are looking at in this example, but I'm guessing you'll need to extend it:

> gsub("^#([1-2])$", "\\1", original)
[1] "1"   "2"   "#10" "#11"

So, that's basically, "from the start, look for a hash symbol followed by either the exact number one or two. The one or two should be just one digit (that's why we don't use * or + or something) and also ends the string. Oh, and capture that one or two because we want to 'backreference' it."

like image 104
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 26 '22 10:09

A5C1D2H2I1M1N2O1R2T1


Another option using gsubfn:

library(gsubfn)
gsubfn("^#([1-2])$",  I, original)   ## Function substituting
[1] "1"   "2"   "#10" "#11"

Or if you want to explicitly use the values of your vector , using vec values:

gsubfn("^#[1-2]$",  as.list(setNames(vec,c("#1", "#2"))), original) 

Or formula notation equivalent to function notation:

gsubfn("^#([1-2])$",  ~ x, original)   ## formula substituting
like image 36
agstudy Avatar answered Sep 24 '22 10:09

agstudy