Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to find and replace conditionally

Tags:

regex

r

I need to replace string A with string B, only when string A is a whole word (e.g. "MECH"), and I don't want to make the replacement when A is a part of a longer string (e.g. "MECHANICAL"). So far, I have a grepl() which checks if string A is a whole string, but I cannot figure out how to make the replacement. I have added an ifelse() with the idea to makes the gsub() replacement when grep() returns TRUE, otherwise not to replace. Any suggestions? Please see the code below. Thanks.

aa <- data.frame(type = c("CONSTR", "MECH CONSTRUCTION", "MECHANICAL CONSTRUCTION MECH", "MECH CONSTR", "MECHCONSTRUCTION"))

from <- c("MECH", "MECHANICAL", "CONSTR",  "CONSTRUCTION")
to <- c("MECHANICAL", "MECHANICAL", "CONSTRUCTION", "CONSTRUCTION")

gsub2 <- function(pattern, replacement, x, ...) {
  for(i in 1:length(pattern)){
    reg <- paste0("(^", pattern[i], "$)|(^", pattern[i], " )|( ", pattern[i], "$)|( ", pattern[i], " )")
    ifelse(grepl(reg, aa$type),
           x <- gsub(pattern[i], replacement[i], x, ...),
           aa$type)
  }
  x
}

aa$title3 <- gsub2(from, to, aa$type)
like image 321
vatodorov Avatar asked Nov 02 '22 16:11

vatodorov


1 Answers

You can enclose the strings in the from vector in \\< and \\> to match only whole words:

x <- c("CONSTR", "MECH CONSTRUCTION", "MECHANICAL CONSTRUCTION MECH", 
       "MECH CONSTR", "MECHCONSTRUCTION")

from <- c("\\<MECH\\>", "\\<CONSTR\\>")
to <- c("MECHANICAL", "CONSTRUCTION")

for(i in 1:length(from)){
  x <- gsub(from[i], to[i], x)
}

print(x)
# [1] "CONSTRUCTION"                       "MECHANICAL CONSTRUCTION"           
# [3] "MECHANICAL CONSTRUCTION MECHANICAL" "MECHANICAL CONSTRUCTION"           
# [5] "MECHCONSTRUCTION"
like image 98
sieste Avatar answered Nov 15 '22 07:11

sieste