Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rowSums by group vector in R?

Tags:

r

Suppose I have a data frame such like:

df<-data.frame(A.1=1:5,B.1=2:6,C.1=3:7, 
               D.2=4:8,E.2=5:9,F.2=6:10)
df
  A.1 B.1 C.1 D.2 E.2 F.2
1   1   2   3   4   5   6
2   2   3   4   5   6   7
3   3   4   5   6   7   8
4   4   5   6   7   8   9
5   5   6   7   8   9  10

What I wanted is to rowSums() by a group vector which is the column names of df without Letters (e.g. c(1,1,1,2,2,2)) and the output would be:

      1  2
[1,]  6 15
[2,]  9 18
[3,] 12 21
[4,] 15 24
[5,] 18 27

My real data set has more than 110K cols from 18 groups and would find an elegant and easy way to realize it.

like image 443
David Z Avatar asked Sep 02 '25 04:09

David Z


2 Answers

rowsum can do this:

t(rowsum(t(df), c(1,1,1,2,2,2)))

      1  2
[1,]  6 15
[2,]  9 18
[3,] 12 21
[4,] 15 24
[5,] 18 27

(For whatever reason, there is no colsum.)

like image 166
Frank Avatar answered Sep 05 '25 01:09

Frank


Because of the way data.frames are structured internally, row-wise operations are generally much slower than column-wise operations. Given your comment about how large this data.frame actually is, I would probably use data.table to convert it to long, isolate the group as its own variable, and perform a group-wise sum.

df <- data.frame(
  A.1 = 1:5,
  B.1 = 6:10,
  C.2 = 11:15,
  D.2 = 16:20
)

First, convert the data.frame to data.table using setDT.

library(data.table)

setDT(df)

Then, add a row_number column (:= creates a new column; .N is a special variable containing the number of rows in the table).

df[, row_number := 1:.N]

Convert this to a "long" data.table, using row_number as the unique ID column.

df_long <- melt(df, id.vars = "row_number")
df_long
#>     row_number variable value
#>  1:          1      A.1     1
#>  2:          2      A.1     2
#>  3:          3      A.1     3
#>  4:          4      A.1     4
#>  5:          5      A.1     5
#>  6:          1      B.1     6
#>  7:          2      B.1     7
#>  8:          3      B.1     8
#>  9:          4      B.1     9
#> 10:          5      B.1    10
#> 11:          1      C.2    11
#> 12:          2      C.2    12
#> 13:          3      C.2    13
#> 14:          4      C.2    14
#> 15:          5      C.2    15
#> 16:          1      D.2    16
#> 17:          2      D.2    17
#> 18:          3      D.2    18
#> 19:          4      D.2    19
#> 20:          5      D.2    20

Create a new group column that is everything after the "." in the new "variable" column (^.*?\\. is everything from the beginning of the string to the first "."; gsub(pattern, "", variable) removes pattern from variable).

df_long[, group := as.integer(gsub("^.*?\\.", "", variable))]
df_long
#>     row_number variable value group
#>  1:          1      A.1     1     1
#>  2:          2      A.1     2     1
#>  3:          3      A.1     3     1
#>  4:          4      A.1     4     1
#>  5:          5      A.1     5     1
#>  6:          1      B.1     6     1
#>  7:          2      B.1     7     1
#>  8:          3      B.1     8     1
#>  9:          4      B.1     9     1
#> 10:          5      B.1    10     1
#> 11:          1      C.2    11     2
#> 12:          2      C.2    12     2
#> 13:          3      C.2    13     2
#> 14:          4      C.2    14     2
#> 15:          5      C.2    15     2
#> 16:          1      D.2    16     2
#> 17:          2      D.2    17     2
#> 18:          3      D.2    18     2
#> 19:          4      D.2    19     2
#> 20:          5      D.2    20     2

Finally, do sum(value) grouping by row_number and group. These sorts of operations are blisteringly fast in data.table, significantly faster and more efficient than base R.

df_long[, sum(value), by = list(row_number, group)]
#>     row_number group V1
#>  1:          1     1  7
#>  2:          2     1  9
#>  3:          3     1 11
#>  4:          4     1 13
#>  5:          5     1 15
#>  6:          1     2 27
#>  7:          2     2 29
#>  8:          3     2 31
#>  9:          4     2 33
#> 10:          5     2 35
like image 40
Alexey Shiklomanov Avatar answered Sep 05 '25 01:09

Alexey Shiklomanov