I have to following issue using R. In short I want to create multiple new columns in a data frame based on calculations of different column pairs in the data frame.
The data looks as follows:
df <- data.frame(a1 = c(1:5),
b1 = c(4:8),
c1 = c(10:14),
a2 = c(9:13),
b2 = c(3:7),
c2 = c(15:19))
df
a1 b1 c1 a2 b2 c2
1 4 10 9 3 15
2 5 11 10 4 16
3 6 12 11 5 17
4 7 13 12 6 18
5 8 14 13 7 19
The output is supposed to look like the following:
a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1 4 10 9 3 15 10 7 25
2 5 11 10 4 16 12 9 27
4 7 13 12 6 18 16 13 31
5 8 14 13 7 19 18 15 33
I can achieve this using dplyr doing some manual work in the following way:
df %>% rowwise %>% mutate(sum_a = sum(a1, a2),
sum_b = sum(b1, b2),
sum_c = sum(c1, c2)) %>%
as.data.frame()
So what is being done is: take columns with the letter "a" in it, calulate the sum rowwise, and create a new column with the sum named sum_[letter]. Repeat for columns with different letters.
This is working, however, if I have a large data set with say 300 different column pairs the manual input would be significant, since I would have to write 300 mutate calls.
I recently stumbled upon the R package "purrr" and my guess is that this would solve my problem of doing what I want in a more automated way.
In particular, I would think to be able to use purrr:map2 to which I pass two lists of column names.
Then I could calculate the sum of each matching list entry, in the form of:
map2(list1, list2, ~mutate(sum))
However, I am not able to figure out how to best approach this problem using purrr. I am rather new to using purrr, so I would really appreciate any help on this issue.
9! mutate() can be used to create variables based on existing variables from the dataset. You can also create multiple columns at once, separating each new variable with a comma.
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
The map functions transform their input by applying a function to each element of a list or atomic vector and returning an object of the same length as the input. map() always returns a list. See the modify() family for versions that return an object of the same type as the input.
Here is one option with purrr
. We get the unique
prefix of the names
of the dataset ('nm1'), use map
(from purrr
) to loop through the unique names, select
the column that matches
the prefix value of 'nm1', add the rows using reduce
and the bind the columns (bind_cols
) with the original dataset
library(tidyverse)
nm1 <- names(df) %>%
substr(1, 1) %>%
unique
nm1 %>%
map(~ df %>%
select(matches(.x)) %>%
reduce(`+`)) %>%
set_names(paste0("sum_", nm1)) %>%
bind_cols(df, .)
# a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
#1 1 4 10 9 3 15 10 7 25
#2 2 5 11 10 4 16 12 9 27
#3 3 6 12 11 5 17 14 11 29
#4 4 7 13 12 6 18 16 13 31
#5 5 8 14 13 7 19 18 15 33
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With