Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace a set of pattern matches with corresponding replacement strings in R

Tags:

string

replace

r

The str_replace (and preg_replace) function in PHP replaces all occurrences of the search string with the replacement string. What interests me the most here, is that if search and replace args are arrays (in R we call that vectors), then str_replace takes a value from each array (vector) and uses them to search and replace on subject.

In other words, does R (or some R package) have a function to perform the following:

string <- "The quick brown fox jumped over the lazy dog."
patterns     <- c("quick", "brown", "fox")
replacements <- c("slow",  "black", "bear")
xxx_replace_xxx(string, patterns, replacements)          ## ???
## [1] "The slow black bear jumped over the lazy dog."

So I am seeking for something like chartr, but for search patterns and replacement strings of arbitrary number of characters. This cannot be done via one call to gsub() as its replacement argument can be a single string only, see ?gsub. So my current implementation is like:

xxx_replace_xxx <- function(string, patterns, replacements) {
   for (i in seq_along(patterns))
      string <- gsub(patterns[i], replacements[i], string, fixed=TRUE)
   string
}

However, I am looking for something much faster if length(patterns) is large - I have a lot of data to process and I'm dissatisfied with the current results.

Exemplary toy data for benchmarking:

string <- readLines("http://www.gutenberg.org/files/31536/31536-0.txt", encoding="UTF-8")
patterns <- c("jak", "to", "do", "z", "na", "i", "w", "za", "tu", "gdy",
   "po", "jest", "Tadeusz", "lub", "razem", "nas", "przy", "oczy", "czy",
   "sam", "u", "tylko", "bez", "ich", "Telimena", "Wojski", "jeszcze")
replacements <- paste0(patterns, rev(patterns))
like image 1000
gagolews Avatar asked Oct 31 '14 13:10

gagolews


People also ask

How do I replace a string with another string in R?

Use str_replace_all() method of stringr package to replace multiple string values with another list of strings on a single column in R and update part of a string with another string.

Which function is used to replacing pattern in string?

The replace() method returns a new string with one, some, or all matches of a pattern replaced by a replacement . The pattern can be a string or a RegExp , and the replacement can be a string or a function called for each match. If pattern is a string, only the first occurrence will be replaced.

How do I replace multiple patterns in R?

The gsub() function in R can be used to replace all occurrences of a certain pattern within a string in R.

Does string replace replacing all occurrences?

Python String | replace() The replace() in Python returns a copy of the string where all occurrences of a substring are replaced with another substring.


1 Answers

Using PCRE instead of fixed matching takes ~1/3 the time on my machine for your example.

xxx_replace_xxx_pcre <- function(string, patterns, replacements) {
   for (i in seq_along(patterns))
      string <- gsub(patterns[i], replacements[i], string, perl=TRUE)
   string
}
system.time(x <- xxx_replace_xxx(string, patterns, replacements))
#    user  system elapsed 
#   0.491   0.000   0.491 
system.time(p <- xxx_replace_xxx_pcre(string, patterns, replacements))
#    user  system elapsed 
#   0.162   0.000   0.162 
identical(x,p)
# [1] TRUE
like image 198
Joshua Ulrich Avatar answered Oct 13 '22 02:10

Joshua Ulrich