Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to mutate multiple columns as function of multiple columns systematically?

Tags:

r

dplyr

across

I have a tibble with a number of variables collected over time. A very simplified version of the tibble looks like this.

df = tribble(
~id, ~varA.t1, ~varA.t2, ~varB.t1, ~varB.t2,
'row_1', 5, 10, 2, 4,
'row_2', 20, 50, 4, 6
)

I want to systematically create a new set of variables varC so that varC.t# = varA.t# / varB.t# where # is 1, 2, 3, etc. (similarly to the way column names are setup in the tibble above).

How do I use something along the lines of mutate or across to do this?

like image 644
Maher Said Avatar asked Apr 10 '21 03:04

Maher Said


People also ask

How do I apply a function across all columns in R?

Apply any function to all R data frame You can set the MARGIN argument to c(1, 2) or, equivalently, to 1:2 to apply the function to each value of the data frame. If you set MARGIN = c(2, 1) instead of c(1, 2) the output will be the same matrix but transposed. The output is of class “matrix” instead of “data.

What does across () do in R?

across() returns a tibble with one column for each column in .

How do I specify multiple columns in R?

To get multiple columns of matrix, specify the column numbers as a vector preceded by a comma, in square brackets, after the matrix variable name. This expression returns the required columns as a matrix.


2 Answers

You can do something like this with mutate(across..., however, for renaming columns there must be a shortcut.

df %>% 
  mutate(across(.cols = c(varA.t1, varA.t2),
                .fns = ~ .x / get(glue::glue(str_replace(cur_column(), "varA", "varB"))),
                .names = "V_{.col}")) %>%
  rename_with(~str_replace(., "V_varA", "varC"), starts_with("V_"))

# A tibble: 2 x 7
  id    varA.t1 varA.t2 varB.t1 varB.t2 varC.t1 varC.t2
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 row_1       5      10       2       4     2.5    2.5 
2 row_2      20      50       4       6     5      8.33

If there is a long time series you can also create a vector for .cols beforehand.

like image 189
AnilGoyal Avatar answered Oct 02 '22 00:10

AnilGoyal


I have a package on GitHub called {dplyover} which aims to solve this kind of problem in way similar to dplyr::across.

The function is called across2. It lets you define two sets of columns to which you can apply one or several functions. The .names argument supports two glue specifictions: {pre} and {suf}. They extract the shared pre- and suffix of the variable names. This makes it easy to put nice names on our output variables.

The function has one caveat. It is not performant when applied to highly grouped data (there is a vignette with benchmarks).

library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover

df = tribble(
  ~id, ~varA.t1, ~varA.t2, ~varB.t1, ~varB.t2,
  'row_1', 5, 10, 2, 4,
  'row_2', 20, 50, 4, 6
)

df %>% 
  mutate(across2(starts_with("varA"),
                 starts_with("varB"),
                 ~ .x / .y,
                 .names = "{pre}C.{suf}"))

#> # A tibble: 2 x 7
#>   id    varA.t1 varA.t2 varB.t1 varB.t2 varC.t1 varC.t2
#>   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 row_1       5      10       2       4     2.5    2.5 
#> 2 row_2      20      50       4       6     5      8.33

Created on 2021-04-10 by the reprex package (v0.3.0)

like image 43
TimTeaFan Avatar answered Oct 02 '22 02:10

TimTeaFan