Consider the array a
:
> a <- array(c(1:9, 1:9), c(3,3,2))
> a
, , 1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
, , 2
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
How do we efficiently compute the row sums of the matrices indexed by the third dimension, such that the result is:
[,1] [,2]
[1,] 12 12
[2,] 15 15
[3,] 18 18
??
The column sums are easy via the 'dims'
argument of colSums()
:
> colSums(a, dims = 1)
but I cannot find a way to use rowSums()
on the array to achieve the desired result, as it has a different interpretation of 'dims'
to that of colSums()
.
It is simple to compute the desired row sums using:
> apply(a, 3, rowSums)
[,1] [,2]
[1,] 12 12
[2,] 15 15
[3,] 18 18
but that is just hiding the loop. Are there other efficient, truly vectorised, ways of computing the required row sums?
To find the sum of row, columns, and total in a matrix can be simply done by using the functions rowSums, colSums, and sum respectively.
Syntax: mutate(new-col-name = rowSums(.)) The rowSums() method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified. The argument . is used to apply the function over all the cells of the data frame.
The sum() function in R to find the sum of the values in the vector. This tutorial shows how to find the sum of the values, the sum of a particular row and column, and also how to get the summation value of each row and column in the dataset.
@Fojtasek's answer mentioned splitting up the array reminded me of the aperm()
function which allows one to permute the dimensions of an array. As colSums()
works, we can swap the first two dimensions using aperm()
and run colSums()
on the output.
> colSums(aperm(a, c(2,1,3)))
[,1] [,2]
[1,] 12 12
[2,] 15 15
[3,] 18 18
Some comparison timings of this and the other suggested R-based answers:
> b <- array(c(1:250000, 1:250000),c(5000,5000,2))
> system.time(rs1 <- apply(b, 3, rowSums))
user system elapsed
1.831 0.394 2.232
> system.time(rs2 <- rowSums3d(b))
user system elapsed
1.134 0.183 1.320
> system.time(rs3 <- sapply(1:dim(b)[3], function(i) rowSums(b[,,i])))
user system elapsed
1.556 0.073 1.636
> system.time(rs4 <- colSums(aperm(b, c(2,1,3))))
user system elapsed
0.860 0.103 0.966
So on my system the aperm()
solution appears marginally faster:
> sessionInfo()
R version 2.12.1 Patched (2011-02-06 r54249)
Platform: x86_64-unknown-linux-gnu (64-bit)
However, rowSums3d()
doesn't give the same answers as the other solutions:
> all.equal(rs1, rs2)
[1] "Mean relative difference: 0.01999992"
> all.equal(rs1, rs3)
[1] TRUE
> all.equal(rs1, rs4)
[1] TRUE
You could chop up the array into two dimensions, compute row sums on that, and then put the output back together the way you want it. Like so:
rowSums3d <- function(a){
m <- matrix(a,ncol=ncol(a))
rs <- rowSums(m)
matrix(rs,ncol=2)
}
> a <- array(c(1:250000, 1:250000),c(5000,5000,2))
> system.time(rowSums3d(a))
user system elapsed
1.73 0.17 1.96
> system.time(apply(a, 3, rowSums))
user system elapsed
3.09 0.46 3.74
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With