Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply different functions to columns of a dataframe selecting functions by name

Tags:

r

dplyr

apply

Let's say I've got a dataframe with multiple columns, some of which I want to transform. The column names define what transformation needs to be used.

library(tidyverse)
set.seed(42)
df <- data.frame(A = 1:100, B = runif(n = 100, 0, 1), log10 = runif(n = 100, 10, 100), log2 = runif(n = 100, 10, 100), log1p = runif(n = 100, 10, 100), sqrt = runif(n = 100, 10, 100))
trans <- list()
trans$log10 <- log10
trans$log2 <- log2
trans$log1p <- log1p
trans$sqrt <- sqrt

Ideally, I would like to use an across call where the column names were matched up with the trans function names and the transformations would be performed on the fly. The desired output is the following:

df_trans <- df %>% 
  dplyr::mutate(log10 = trans$log10(log10),
                log2 = trans$log2(log2),
                log1p = trans$log1p(log1p),
                sqrt = trans$sqrt(sqrt))
df_trans

However, I don't want to manually specify each transformation separately. In the representative example I only have 4 but this number could vary and be significantly higher making manual specification cumbersome and error prone.

I have managed to match up the column names with the functions by turning the trans list into a data frame and left-joining but am then unable to call the function in the trans_function column.

trans_df <- enframe(trans, value = "trans_function")
df %>% 
  pivot_longer(cols = everything()) %>% 
  left_join(trans_df) %>% 
  dplyr::mutate(value = trans_function(value))

Error: Problem with mutate() column value.
i value = trans_function(value).
x could not find function "trans_function"

I think I either need to find a way of calling the functions from the list columns or another way of matching up the function names with the column names. All ideas are welcome.

like image 560
biomiha Avatar asked Dec 19 '21 09:12

biomiha


People also ask

How do I apply a custom function to a DataFrame column?

There are generally 3 ways to apply custom functions in Pandas: map , apply , and applymap . map works element-wise on a series, and is optimized for mapping values to a series (e.g. one column of a DataFrame). applymap works element-wise on a DataFrame, and is optimized for mapping values to a DataFrame.

How do I apply a lambda function to a column in pandas?

We can do this with the apply() function in Pandas. We can use the apply() function to apply the lambda function to both rows and columns of a dataframe. If the axis argument in the apply() function is 0, then the lambda function gets applied to each column, and if 1, then the function gets applied to each row.

How to apply a function to each row/column in Dataframe?

There are different ways to apply a function to each row or column in DataFrame. We will learn about various ways in this post. Let’s create a small dataframe first and see that. Method 1: Applying lambda function to each row/column. In the above examples, we saw how a user defined function is applied to each row and column.

Can we apply a function to more than one column?

Example 2: For Row. We can also apply a function to more than one column or row in the dataframe. Example 2: For Row. How to Apply a function to multiple columns in Pandas?

How do I apply a function to a column in pandas?

In Pandas, columns and dataframes can be transformed and manipulated using methods such as apply () and transform (). The desired transformations are passed in as arguments to the methods as functions. Each method has its subtle differences and utility. This article will introduce how to apply a function to a column or an entire dataframe.

How do I transform a column in a Dataframe in pandas?

Use transform () to Apply a Function to Pandas DataFrame Column In Pandas, columns and dataframes can be transformed and manipulated using methods such as apply () and transform (). The desired transformations are passed in as arguments to the methods as functions. Each method has its subtle differences and utility.


Video Answer


2 Answers

We can use cur_column() in across to get the column name and use it to subset trans.

library(dplyr)

df %>%
  mutate(across(names(trans), ~trans[[cur_column()]](.x))) %>%
  head

#  A         B    log10     log2    log1p     sqrt
#1 1 0.9148060 1.821920 6.486402 3.998918 3.470303
#2 2 0.9370754 1.470472 5.821200 3.932046 7.496103
#3 3 0.2861395 1.469690 6.437524 2.799395 8.171007
#4 4 0.8304476 1.653261 5.639570 3.700698 6.905755
#5 5 0.6417455 1.976905 4.597484 4.500461 9.441077
#6 6 0.5190959 1.985133 5.638341 4.551289 4.440590

Comparing it with output of df_trans.

head(df_trans)

#  A         B    log10     log2    log1p     sqrt
#1 1 0.9148060 1.821920 6.486402 3.998918 3.470303
#2 2 0.9370754 1.470472 5.821200 3.932046 7.496103
#3 3 0.2861395 1.469690 6.437524 2.799395 8.171007
#4 4 0.8304476 1.653261 5.639570 3.700698 6.905755
#5 5 0.6417455 1.976905 4.597484 4.500461 9.441077
#6 6 0.5190959 1.985133 5.638341 4.551289 4.440590
like image 96
Ronak Shah Avatar answered Oct 17 '22 21:10

Ronak Shah


One way can be to use lapply:

library(tidyverse)
set.seed(42)

df <- data.frame(A = 1:100, B = runif(n = 100, 0, 1), log10 = runif(n = 100, 10, 100), log2 = runif(n = 100, 10, 100), log1p = runif(n = 100, 10, 100), sqrt = runif(n = 100, 10, 100))
trans <- list()
trans$log10 <- log10
trans$log2 <- log2
trans$log1p <- log1p
trans$sqrt <- sqrt


df_trans <- setNames(lapply(names(df),
            function(x) if(x %in% names(trans))
            { trans[[x]](df[,(x)])} else {df[,x]}),names(df)) %>% 
  bind_cols() %>% 
  as.data.frame() 

head(df_trans)

which gives:

  A         B    log10     log2    log1p     sqrt
  1 1 0.1365052 1.739051 6.301896 4.530600 4.318942
  2 2 0.1771364 1.549601 5.793220 4.521715 3.649834
  3 3 0.5195605 1.902438 4.819125 3.343266 6.788565
  4 4 0.8111208 1.572253 6.219991 4.075945 3.322401
  5 5 0.1153620 1.751276 6.306097 4.060292 7.817301
  6 6 0.8934218 1.724403 6.201123 3.235938 9.749128

The original dataframe being:

head(df)
  A         B    log10     log2    log1p     sqrt
  1 1 0.1365052 54.83409 78.89684 91.81428 18.65326
  2 2 0.1771364 35.44878 55.45401 90.99323 13.32129
  3 3 0.5195605 79.88006 28.22936 27.31143 46.08461
  4 4 0.8111208 37.34675 74.54249 57.90612 11.03835
  5 5 0.1153620 56.39961 79.12693 56.99123 61.11019
  6 6 0.8934218 53.01557 73.57393 24.43022 95.04549
like image 2
storm surge Avatar answered Oct 17 '22 21:10

storm surge