I have a tibble with a number of variables collected over time. A very simplified version of the tibble looks like this.
df = tribble(
~id, ~varA.t1, ~varA.t2, ~varB.t1, ~varB.t2,
'row_1', 5, 10, 2, 4,
'row_2', 20, 50, 4, 6
)
I want to systematically create a new set of variables varC
so that varC.t#
= varA.t#
/ varB.t#
where #
is 1, 2, 3, etc. (similarly to the way column names are setup in the tibble above).
How do I use something along the lines of mutate
or across
to do this?
Apply any function to all R data frame You can set the MARGIN argument to c(1, 2) or, equivalently, to 1:2 to apply the function to each value of the data frame. If you set MARGIN = c(2, 1) instead of c(1, 2) the output will be the same matrix but transposed. The output is of class “matrix” instead of “data.
across() returns a tibble with one column for each column in .
To get multiple columns of matrix, specify the column numbers as a vector preceded by a comma, in square brackets, after the matrix variable name. This expression returns the required columns as a matrix.
You can do something like this with mutate(across...
, however, for renaming columns there must be a shortcut.
df %>%
mutate(across(.cols = c(varA.t1, varA.t2),
.fns = ~ .x / get(glue::glue(str_replace(cur_column(), "varA", "varB"))),
.names = "V_{.col}")) %>%
rename_with(~str_replace(., "V_varA", "varC"), starts_with("V_"))
# A tibble: 2 x 7
id varA.t1 varA.t2 varB.t1 varB.t2 varC.t1 varC.t2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 row_1 5 10 2 4 2.5 2.5
2 row_2 20 50 4 6 5 8.33
If there is a long time series you can also create a vector for .cols
beforehand.
I have a package on GitHub called {dplyover} which aims to solve this kind of problem in way similar to dplyr::across
.
The function is called across2
. It lets you define two sets of columns to which you can apply one or several functions. The .names
argument supports two glue specifictions: {pre}
and {suf}
. They extract the shared pre- and suffix of the variable names. This makes it easy to put nice names on our output variables.
The function has one caveat. It is not performant when applied to highly grouped data (there is a vignette with benchmarks).
library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover
df = tribble(
~id, ~varA.t1, ~varA.t2, ~varB.t1, ~varB.t2,
'row_1', 5, 10, 2, 4,
'row_2', 20, 50, 4, 6
)
df %>%
mutate(across2(starts_with("varA"),
starts_with("varB"),
~ .x / .y,
.names = "{pre}C.{suf}"))
#> # A tibble: 2 x 7
#> id varA.t1 varA.t2 varB.t1 varB.t2 varC.t1 varC.t2
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 row_1 5 10 2 4 2.5 2.5
#> 2 row_2 20 50 4 6 5 8.33
Created on 2021-04-10 by the reprex package (v0.3.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With