Apply different functions to columns of a dataframe selecting functions by name

Tags:

Let's say I've got a dataframe with multiple columns, some of which I want to transform. The column names define what transformation needs to be used.

library(tidyverse)
set.seed(42)
df <- data.frame(A = 1:100, B = runif(n = 100, 0, 1), log10 = runif(n = 100, 10, 100), log2 = runif(n = 100, 10, 100), log1p = runif(n = 100, 10, 100), sqrt = runif(n = 100, 10, 100))
trans <- list()
trans$log10 <- log10
trans$log2 <- log2
trans$log1p <- log1p
trans$sqrt <- sqrt

Ideally, I would like to use an across call where the column names were matched up with the trans function names and the transformations would be performed on the fly. The desired output is the following:

df_trans <- df %>% 
  dplyr::mutate(log10 = trans$log10(log10),
                log2 = trans$log2(log2),
                log1p = trans$log1p(log1p),
                sqrt = trans$sqrt(sqrt))
df_trans

However, I don't want to manually specify each transformation separately. In the representative example I only have 4 but this number could vary and be significantly higher making manual specification cumbersome and error prone.

I have managed to match up the column names with the functions by turning the trans list into a data frame and left-joining but am then unable to call the function in the trans_function column.

trans_df <- enframe(trans, value = "trans_function")
df %>% 
  pivot_longer(cols = everything()) %>% 
  left_join(trans_df) %>% 
  dplyr::mutate(value = trans_function(value))

Error: Problem with mutate() column value.
i value = trans_function(value).
x could not find function "trans_function"

I think I either need to find a way of calling the functions from the list columns or another way of matching up the function names with the column names. All ideas are welcome.

560

asked Dec 19 '21 09:12

biomiha

Video Answer

2 Answers

We can use cur_column() in across to get the column name and use it to subset trans.

library(dplyr)

df %>%
  mutate(across(names(trans), ~trans[[cur_column()]](.x))) %>%
  head

#  A         B    log10     log2    log1p     sqrt
#1 1 0.9148060 1.821920 6.486402 3.998918 3.470303
#2 2 0.9370754 1.470472 5.821200 3.932046 7.496103
#3 3 0.2861395 1.469690 6.437524 2.799395 8.171007
#4 4 0.8304476 1.653261 5.639570 3.700698 6.905755
#5 5 0.6417455 1.976905 4.597484 4.500461 9.441077
#6 6 0.5190959 1.985133 5.638341 4.551289 4.440590

Comparing it with output of df_trans.

head(df_trans)

#  A         B    log10     log2    log1p     sqrt
#1 1 0.9148060 1.821920 6.486402 3.998918 3.470303
#2 2 0.9370754 1.470472 5.821200 3.932046 7.496103
#3 3 0.2861395 1.469690 6.437524 2.799395 8.171007
#4 4 0.8304476 1.653261 5.639570 3.700698 6.905755
#5 5 0.6417455 1.976905 4.597484 4.500461 9.441077
#6 6 0.5190959 1.985133 5.638341 4.551289 4.440590

answered Oct 17 '22 21:10

Ronak Shah

One way can be to use lapply:

library(tidyverse)
set.seed(42)

df <- data.frame(A = 1:100, B = runif(n = 100, 0, 1), log10 = runif(n = 100, 10, 100), log2 = runif(n = 100, 10, 100), log1p = runif(n = 100, 10, 100), sqrt = runif(n = 100, 10, 100))
trans <- list()
trans$log10 <- log10
trans$log2 <- log2
trans$log1p <- log1p
trans$sqrt <- sqrt


df_trans <- setNames(lapply(names(df),
            function(x) if(x %in% names(trans))
            { trans[[x]](df[,(x)])} else {df[,x]}),names(df)) %>% 
  bind_cols() %>% 
  as.data.frame() 

head(df_trans)

which gives:

  A         B    log10     log2    log1p     sqrt
  1 1 0.1365052 1.739051 6.301896 4.530600 4.318942
  2 2 0.1771364 1.549601 5.793220 4.521715 3.649834
  3 3 0.5195605 1.902438 4.819125 3.343266 6.788565
  4 4 0.8111208 1.572253 6.219991 4.075945 3.322401
  5 5 0.1153620 1.751276 6.306097 4.060292 7.817301
  6 6 0.8934218 1.724403 6.201123 3.235938 9.749128

The original dataframe being:

head(df)
  A         B    log10     log2    log1p     sqrt
  1 1 0.1365052 54.83409 78.89684 91.81428 18.65326
  2 2 0.1771364 35.44878 55.45401 90.99323 13.32129
  3 3 0.5195605 79.88006 28.22936 27.31143 46.08461
  4 4 0.8111208 37.34675 74.54249 57.90612 11.03835
  5 5 0.1153620 56.39961 79.12693 56.99123 61.11019
  6 6 0.8934218 53.01557 73.57393 24.43022 95.04549

answered Oct 17 '22 21:10

storm surge

Related questions
                            
                                Using table() function from base with dplyr pipe-syntax?
                            
                                tidyverse not loaded, it says "namespace ‘vctrs’ 0.2.0 is already loaded, but >= 0.2.1 is required"
                            
                                filter data in shiny app but keeping values in selectInput when updating table
                            
                                In writing an R package, using the flowCore::transform function, can I both use a variable name as text and get the actual value?
                            
                                Passing `lm` result to `stepAIC` works in script, fails inside function
                            
                                dplyr alternative for plyr::mapvalues (recode using dictionary)
                            
                                Combine multiple facet strips across columns in ggplot2 facet_wrap
                            
                                converting function calls to characters in R [duplicate]
                            
                                curly curly tidy evaluation programming with multiple inputs and custom function across columns
                            
                                Invalid value at 'start_index' (TYPE_UINT64), "1e+05" [invalid] issue while downloading data to R from BigQuery
                            
                                Print Shiny App Screen not working Error: shinyjs: extendShinyjs: `functions` argument must be provided
                            
                                Selecting a default value in an R plotly plot using a selectize box via crosstalk in R, using static html not shiny
                            
                                Setting Up Visual Studio code to work with R - "win32 can't use R"
                            
                                Pie chart and Bar chart aligned on same plot
                            
                                Warning in every model of glmmTMB 'giveCsparse'
                            
                                R formula: wrap all variables in a transformation
                            
                                Create new dataframe by dividing all possibles columns combination from another table
                            
                                R - ggmap - calculate shortest distance between cities via geocoding
                            
                                Fast way to calculate values in cells based on values in previous rows
                            
                                Transform each column factors in a column containing just `0` or `1`

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apply different functions to columns of a dataframe selecting functions by name

Tags:

r

dplyr

apply