Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match and replace multiple strings in a vector of text without looping in R

Tags:

string

r

gsub

I am trying to apply gsub in R to replace a match in string a with the corresponding match in string b. For example:

a <- c("don't", "i'm", "he'd")
b <- c("do not", "i am", "he would")
c <- c("i'm going to the party", "he'd go too")
newc <- gsub(a, b, c)

with the desired result being

newc = c("i am going to the party", "he would go too")

This approach does not work because gsub only accepts a string of length 1 for a and b. Executing a loop to cycle through a and b will be very slow since the real a and b have a length of 90 and c has a length > 200,000. Is there a vectorized way in R to perform this operation?

like image 797
boomt Avatar asked Apr 02 '15 00:04

boomt


People also ask

How to replace matched patterns in a string in R?

The str_replace () function from the stringr package in R can be used to replace matched patterns in a string. This function uses the following syntax: This tutorial provides several examples of how to use this function in practice on the following data frame:

How do you replace a string with a vector in Python?

Should be either length one, or the same length as string or pattern . References of the form \1, \2, etc will be replaced with the contents of the respective matched group (created by () ). To perform multiple replacements in each element of string , pass a named vector ( c (pattern1 = replacement1)) to str_replace_all.

How do you replace a vector with a value in R?

Within the replace function, we have to specify the name of our vector object (i.e. my_vec, a logical condition (i.e. my_vec == 1), and the value we want to insert (i.e. 999): The output of the previous R syntax is exactly the same as in Example 1.

What is R match () function in R?

The R match () function – returns the indices of common elements the %in% operator – returns a vector of True / False results which indicates if a value in the first vector was present in the second. R Match – Finding Values in Vectors Let us get started with the R match () function.


2 Answers

1) gsubfn gsubfn in the gsubfn package is like gsub except the replacement string can be a character string, list, function or proto object. If its a list it will replace each matched string with the component of the list whose name equals the matched string.

library(gsubfn)
gsubfn("\\S+", setNames(as.list(b), a), c)

giving:

[1] "i am going to the party" "he would go too"    

2) gsub For a solution with no packages try this loop:

cc <- c
for(i in seq_along(a)) cc <- gsub(a[i], b[i], cc, fixed = TRUE)

giving:

> cc
[1] "i am going to the party" "he would go too"        
like image 81
G. Grothendieck Avatar answered Sep 21 '22 14:09

G. Grothendieck


stringr::str_replace_all() is an option:

library(stringr)
names(b) <- a
str_replace_all(c, b)
[1] "i am going to the party" "he would go too"  

Here is the same code but with different labels to hopefully make it a little clearer:

to_replace <- a
replace_with <- b
target_text <- c

names(replace_with) <- to_replace
str_replace_all(target_text, replace_with)
like image 26
sbha Avatar answered Sep 25 '22 14:09

sbha