Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summing by groups of rows in R

Tags:

r

matrix

sum

apply

This is a bit of a difficult question to title, so edits welcome. The data looks like this:

mat =         

     [,1]
 [1,] 9.586352e-04
 [2,]           NA
 [3,] 2.605841e-03
 [4,] 7.868957e-05
 [5,] 1.000000e+00
 [6,]           NA
 [7,] 8.208500e-02
 [8,] 2.605841e-03
 [9,] 7.868957e-05
[10,] 1.000000e+00
[11,] 9.586352e-04
[12,] 8.208500e-02
[13,] 2.605841e-03
[14,] 7.868957e-05
[15,] 1.000000e+00

I want to sum every 5 elements, so since there are 15, I the length of the returned vector should be 3. (15/3). So for example, just count the NA's as 0.

How do I do this?

I also want to ignore the NA's

like image 650
wolfsatthedoor Avatar asked Jan 24 '15 00:01

wolfsatthedoor


4 Answers

m <- matrix(1:15, ncol = 1)
m[cbind(c(3,7),c(1, 1))] <- NA

library(zoo)
rollapply(m, sum, width = 5, by = 5, na.rm = TRUE)
     [,1]
[1,]   12
[2,]   33
[3,]   65
like image 173
DatamineR Avatar answered Sep 20 '22 13:09

DatamineR


You could use tapply()

mat <- matrix(c(1, 2, NA, 4:6, NA, 8:15))
## set up a grouping vector
grp <- rep(1:(nrow(mat)/5), each = 5)
## compute group sums
tapply(mat, grp, sum, na.rm = TRUE)
#  1  2  3 
# 12 33 65   

A less efficient option involves split() with vapply()

vapply(split(mat, grp), sum, 1, na.rm = TRUE)
#  1  2  3 
# 12 33 65 
like image 31
Rich Scriven Avatar answered Sep 22 '22 13:09

Rich Scriven


This is ideal for ?rowsum, which should be fast

Using RStudent's data

rowsum(m, rep(1:3, each=5), na.rm=TRUE)

The second argument, group, defines the rows which to apply the sum over. More generally, the group argument could be defined rep(1:nrow(m), each=5, length=nrow(m)) (sub nrow with length if applying over a vector)

like image 21
user20650 Avatar answered Sep 21 '22 13:09

user20650


Using dplyr

library(dplyr)
mat <- matrix(c(1, 2, NA, 4:6, NA, 8:15))
df <- data.frame(mat)

df %>%
  mutate(group = rep(1:(n()/5), each=5)) %>%
  group_by(group) %>%
  summarise(mat = sum(mat, na.rm = TRUE))

You get:

#Source: local data frame [3 x 2]

#  group mat
#1     1  12
#2     2  33
#3     3  65

If, for some reasons, you would like to replace NAs by 0 (because you want to perform some other operations than a sum(), say a mean()) you can do:

df %>%
  mutate(mat = ifelse(is.na(mat), 0, mat)) %>%
  mutate(group = rep(1:(n()/5), each=5)) %>%
  group_by(group) %>%
  summarise(mat = mean(mat))

You'll get the result with NAs equal to 0 (instead of omitting NA with na.rm = TRUE in previous suggestion)

#Source: local data frame [3 x 2]

#  group  mat
#1     1  2.4
#2     2  6.6
#3     3 13.0
like image 20
Steven Beaupré Avatar answered Sep 21 '22 13:09

Steven Beaupré