Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(tidyverse approach) calculating rowsum across several columns where info on columns to include comes from a different data frame

Tags:

r

tidyverse

Assume the following data:

dat <- data.frame(x1 = c(1, 2, 3, 4, 5),
                  x2 = c(2, 3, 4, 5, 6),
                  x3 = c(3, 4, 5, 6, 7),
                  x4 = c(7, 2, 3, 4, 5),
                  x5 = c(7, 2, 1, 4, 5))

Further assume the following lookup table:

lookup_positions <- data.frame(v1 = c(1,3,5),
                               v2 = c(1,2,5),
                               v3 = c(1,3,4),
                               v4 = c(2,3,5))

Now, what I want to do is the following: for each row in dat I want to go through all combinations specified in lookup_positions and calculate the row sums for the dat column positions specified in lookup_positions.

So for all rows in dat I want to calculate the row sum of dat[,c(1,3,5)], then I want to calculuate the row sum of dat[, c(1,2,5)] and so on. So I basically calculate 4 row sums.

I know how to do it in base R using a loop, I also now how to do it in tidyverse approach for one row sum, but not how to do it for all versions mentioned in lookup_positions with tidyverse without loops.

So expected outcome would be:

  x1 x2 x3 x4 x5 rowsum1 rowsum2 rowsum3 rowsum4
1  1  2  3  7  7      11      10      11      12
2  2  3  4  2  2       8       7       8       9
3  3  4  5  3  1       9       8      11      10
4  4  5  6  4  4      14      13      14      15
5  5  6  7  5  5      17      16      17      18

Here's what I got for one of the lookup_positions in the tidyverse. But I'm stuck in how to generalize this for all lookup positions.

dat %>%
  mutate(rowsum1 = apply(across(everything()), 1, function(x) sum(x[as.numeric(lookup_positions[1,])])))

I know for my 4 lookup positions I could simply do a copy paste and be done with it, but my real life data has a few hundred lookup position combinations.

like image 767
deschen Avatar asked Dec 02 '20 12:12

deschen


People also ask

How do I sum multiple rows in R?

Syntax: mutate(new-col-name = rowSums(.)) The rowSums() method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified. The argument . is used to apply the function over all the cells of the data frame. Syntax: rowSums(.)

What does rowwise () do in R?

rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping.


2 Answers

One dplyr and purrr option could be:

map2(.x = asplit(lookup_positions, 2),
     .y = 1:ncol(lookup_positions),
     ~ dat %>%
      mutate(!!paste0("rowsums", .y) := rowSums(select(., .x)))) %>%
 reduce(full_join)

  x1 x2 x3 x4 x5 rowsums1 rowsums2 rowsums3 rowsums4
1  1  2  3  7  7       11       10       11       12
2  2  3  4  2  2        8        7        8        9
3  3  4  5  3  1        9        8       11       10
4  4  5  6  4  4       14       13       14       15
5  5  6  7  5  5       17       16       17       18
like image 135
tmfmnk Avatar answered Nov 15 '22 05:11

tmfmnk


Here is another tidyverse solution you may be interested in

library(dplyr)
library(purrr)
library(stringr)

dat %>% 
  mutate(map_dfc(
    lookup_positions %>% rename_with(~str_replace(., "v", "rowsum")), 
    ~rowSums(.y[, .x]), 
    across(everything())
  ))

Output

  x1 x2 x3 x4 x5 rowsum1 rowsum2 rowsum3 rowsum4
1  1  2  3  7  7      11      10      11      12
2  2  3  4  2  2       8       7       8       9
3  3  4  5  3  1       9       8      11      10
4  4  5  6  4  4      14      13      14      15
5  5  6  7  5  5      17      16      17      18
like image 36
ekoam Avatar answered Nov 15 '22 04:11

ekoam