Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr + gsub on many columns

Tags:

r

dplyr

gsub

I'm using dplyr and gsub to remove special characters. I'm trying to translate a code I had with base R.

Here's a fake example to resemble my data:

region = c("regi\xf3n de tarapac\xe1","regi\xf3n de tarapac\xe1")
provincia = c("cami\xf1a","iquique")
comuna = c("tamarugal","alto hospicio")

comunas = cbind(region,provincia,comuna)

This works for me:

comunas = comunas %>% 
  mutate(comuna = gsub("\xe1", "\u00e1", comuna), # a with acute
         comuna = gsub("<e1>", "\u00e1", comuna) # a with acute
  )

But now I want to apply the same to every column:

comunas = comunas %>% 
  mutate_all(funs(gsub("\xe1", "\u00e1", .), # a with acute
         gsub("<e1>", "\u00e1", .) # a with acute
  ))

And I see the last chunk has no effect. The idea is to obtain:

     region                     provincia   comuna         
[1,] "regi\xf3n de tarapacá" "cami\xf1a" "tamarugal"    
[2,] "regi\xf3n de tarapacá" "iquique"   "alto hospicio"

And any other needed change.

Any idea? many thanks in advance !

like image 797
pachadotdev Avatar asked Apr 17 '17 17:04

pachadotdev


1 Answers

2021 update

mutate_all is now replaced with across. Here are two ways to use gsub across many columns with the help of dplyr:

library(dplyr)

#Without anonymous function
comunas_casen_2015 %>%
  mutate(across(everything(), gsub, pattern = "\xe1|<e1>", replacement = "\u00e1"))

#With anonymous function
comunas_casen_2015 %>%
  mutate(across(everything(),~ gsub("\xe1|<e1>","\u00e1", .)))

              region provincia        comuna
1 región de tarapacá    camiña     tamarugal
2 región de tarapacá   iquique alto hospicio

#data
region = c("regi\xf3n de tarapac\xe1","regi\xf3n de tarapac\xe1")
provincia = c("cami\xf1a","iquique")
comuna = c("tamarugal","alto hospicio")

Original answer

This works for me:

region = c("regi\xf3n de tarapac\xe1","regi\xf3n de tarapac\xe1")
provincia = c("cami\xf1a","iquique")
comuna = c("tamarugal","alto hospicio")

comunas_casen_2015 = data.frame(region,provincia,comuna,stringsAsFactors=FALSE)


comunas_casen_2015 %>%
  mutate(region = gsub("\xe1", "\u00e1", region), # a with acute
         region = gsub("<e1>", "\u00e1", region) # a with acute
  )
  
  
comunas_casen_2015 %>%
  mutate_all(funs(gsub("\xe1", "\u00e1", .), # a with acute
         gsub("<e1>", "\u00e1", .) # a with acute
  ))

              region provincia        comuna        region_gsub provincia_gsub   comuna_gsub
1 región de tarapacá    camiña     tamarugal región de tarapacá         camiña     tamarugal
2 región de tarapacá   iquique alto hospicio región de tarapacá        iquique alto hospicio
like image 183
Pierre Lapointe Avatar answered Nov 15 '22 20:11

Pierre Lapointe