Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using purrr to iteratively replace strings in a dataframe column

Tags:

r

gsub

purrr

I would like to use purrr to iteratively run several string replacements on a dataframe column with the gsub() function.

This is the example dataframe:

df <- data.frame(Year = "2019",
                 Text = c(rep("a aa", 5), 
                          rep("a bb", 3), 
                          rep("a cc", 2)))

> df
   Year Text
1  2019 a aa
2  2019 a aa
3  2019 a aa
4  2019 a aa
5  2019 a aa
6  2019 a bb
7  2019 a bb
8  2019 a bb
9  2019 a cc
10 2019 a cc

This is how I would normally run the string replacement, and the desired result.

df$Text <- gsub("aa", "One", df$Text, fixed = T)
df$Text <- gsub("bb", "Two", df$Text, fixed = T)
df$Text <- gsub("cc", "Three", df$Text, fixed = T)

> df
   Year    Text
1  2019   a One
2  2019   a One
3  2019   a One
4  2019   a One
5  2019   a One
6  2019   a Two
7  2019   a Two
8  2019   a Two
9  2019 a Three
10 2019 a Three

However this is unrealistic to use as the list of string replacements grows, so I tried to use purrr to iterate such changes using a list of patterns and replacements but I've only managed to produce error messages. I expect the code to iterate through text_pattern and text_replacement and run gsub on df$Text for each pair of pattern/replacement. The example is below along with the error messages.

text_pattern <- c("aa", "bb", "cc")
text_replacement <- c("One", "Two", "Three")

walk2(text_pattern, text_replacement, function(...){
  gsub(text_pattern, text_replacement, df$Text, fixed = F)
  }
)

Warning messages:
1: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'pattern' has length > 1 and only the first element will be used
2: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'replacement' has length > 1 and only the first element will be used
3: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'pattern' has length > 1 and only the first element will be used
4: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'replacement' has length > 1 and only the first element will be used
5: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'pattern' has length > 1 and only the first element will be used
6: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'replacement' has length > 1 and only the first element will be used

Is it possible to accomplish this using functions from purrr? Or alternatively am I trying to use the wrong tool and is there a different function I should be using?

like image 917
Fragilaria Avatar asked Jun 05 '26 17:06

Fragilaria


1 Answers

We can use reduce2

library(purrr)
library(stringr)
df$Text <- reduce2(text_pattern, text_replacement, ~ str_replace(..1, ..2, ..3), 
           .init = df$Text)
df$Text
#[1] "a One"   "a One"   "a One"   "a One"   "a One"   "a Two"   "a Two"   "a Two"   "a Three" "a Three"

Or without using anonymous function call

reduce2(text_pattern, text_replacement, .init = df$Text, str_replace)
like image 194
akrun Avatar answered Jun 08 '26 08:06

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!