Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum n highest values by row using dplyr without reshaping?

Tags:

r

dplyr

I would like to create a new column based on the n highest values per row of a data frame.

Take the following example:

library(tibble)
df <- tribble(~name, ~q_1, ~q_2, ~q_3, ~sum_top_2,
              "a", 4, 1, 5, 9,
              "b", 2, 8, 9, 17)

Here, the sum_top_2 column sums the 2 highest values of columns prefixed with "q_". I would like to generalize to the n highest values by row. How can I do this using dplyr without reshaping?

like image 863
nicholas Avatar asked Jun 10 '21 23:06

nicholas


People also ask

How do I sum across rows in R dplyr?

Syntax: mutate(new-col-name = rowSums(.)) The rowSums() method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified. The argument . is used to apply the function over all the cells of the data frame. Syntax: rowSums(.)

How do you sum across rows in R studio?

The rowSums() function in R can be used to calculate the sum of the values in each row of a matrix or data frame in R.

How do you sum using dplyr?

Group By Sum in R using dplyr You can use group_by() function along with the summarise() from dplyr package to find the group by sum in R DataFrame, group_by() returns the grouped_df ( A grouped Data Frame) and use summarise() on grouped df results to get the group by sum.

How do you find the sum of all N values in R data frame columns?

To find the sum of every n values in R data frame columns, we can use rowsum function along with rep function that will repeat the sum for rows.


1 Answers

One option is pmap from purrr to loop over the rows of the columns that starts_with 'q_', by sorting the row in decreasing order, get the first 'n' sorted elements with head and sum

library(dplyr)
library(purrr)
library(stringr)
n <- 2
df %>% 
   mutate(!! str_c("sum_top_", n) := pmap_dbl(select(cur_data(), 
           starts_with('q_')), 
            ~ sum(head(sort(c(...), decreasing = TRUE), n))))

-output

# A tibble: 2 x 5
  name    q_1   q_2   q_3 sum_top_2
  <chr> <dbl> <dbl> <dbl>     <dbl>
1 a         4     1     5         9
2 b         2     8     9        17

Or use rowwise from dplyr.

df %>% 
   rowwise %>% 
   mutate(!! str_c("sum_top_", n) := sum(head(sort(c_across(starts_with("q_")), 
           decreasing = TRUE), n))) %>% 
   ungroup
# A tibble: 2 x 5
  name    q_1   q_2   q_3 sum_top_2
  <chr> <dbl> <dbl> <dbl>     <dbl>
1 a         4     1     5         9
2 b         2     8     9        17
like image 82
akrun Avatar answered Oct 02 '22 00:10

akrun