I would like to aggregate the rows of a matrix by adding the values in rows that have the same rowname. My current approach is as follows: <pre class="prettyprint"><code>> M a b c d 1 1 1 2 0 1 2 3 4 2 2 3 0 1 2 3 4 2 5 2 > index <- as.numeric(rownames(M)) > M <- cbind(M,index) > Dfmat <- data.frame(M) > Dfmat <- aggregate(. ~ index, data = Dfmat, sum) > M <- as.matrix(Dfmat) > rownames(M) <- M[,"index"] > M <- subset(M, select= -index) > M a b c d 1 3 4 6 2 2 3 0 1 2 3 4 2 5 2 </code></pre> The problem of this appraoch is that i need to apply it to a number of very large matrices (up to 1.000 rows and 30.000 columns). In these cases the computation time is very high (Same problem when using ddply). Is there a more eficcient to come up with the solution? Does it help that the original input matrices are DocumentTermMatrix from the tm package? As far as I know they are stored in a sparse matrix format.

Here's a solution using <code>by</code> and <code>colSums</code>, but requires some fiddling due to the default output of <code>by</code>. <pre class="prettyprint"><code>M <- matrix(1:9,3) rownames(M) <- c(1,1,2) t(sapply(by(M,rownames(M),colSums),identity)) V1 V2 V3 1 3 9 15 2 3 6 9 </code></pre>

Aggregate rows in a large matrix by rowname

Tags:

r

aggregate

I would like to aggregate the rows of a matrix by adding the values in rows that have the same rowname. My current approach is as follows:

> M
  a b c d
1 1 1 2 0
1 2 3 4 2
2 3 0 1 2
3 4 2 5 2
> index <- as.numeric(rownames(M))
> M <- cbind(M,index)
> Dfmat <- data.frame(M)
> Dfmat <- aggregate(. ~ index, data = Dfmat, sum)
> M <- as.matrix(Dfmat)
> rownames(M) <- M[,"index"]
> M <- subset(M, select= -index)
> M
   a b c d
 1 3 4 6 2
 2 3 0 1 2
 3 4 2 5 2

The problem of this appraoch is that i need to apply it to a number of very large matrices (up to 1.000 rows and 30.000 columns). In these cases the computation time is very high (Same problem when using ddply). Is there a more eficcient to come up with the solution? Does it help that the original input matrices are DocumentTermMatrix from the tm package? As far as I know they are stored in a sparse matrix format.

765

asked Nov 15 '11 16:11

Christian

2 Answers

Here's a solution using by and colSums, but requires some fiddling due to the default output of by.

M <- matrix(1:9,3)
rownames(M) <- c(1,1,2)
t(sapply(by(M,rownames(M),colSums),identity))
  V1 V2 V3
1  3  9 15
2  3  6  9

176

answered Oct 22 '22 13:10

James

There is now an aggregate function in Matrix.utils. This can accomplish what you want with a single line of code and is about 10x faster than the combineByRow solution and 100x faster than the by solution:

N <- 10000

m <- matrix( runif(N*100), nrow=N)
rownames(m) <- sample(1:(N/2),N,replace=T)

> microbenchmark(a<-t(sapply(by(m,rownames(m),colSums),identity)),b<-combineByRow(m),c<-aggregate.Matrix(m,row.names(m)),times = 10)
Unit: milliseconds
                                                  expr        min         lq       mean     median         uq        max neval
 a <- t(sapply(by(m, rownames(m), colSums), identity)) 6000.26552 6173.70391 6660.19820 6419.07778 7093.25002 7723.61642    10
                                  b <- combineByRow(m)  634.96542  689.54724  759.87833  732.37424  866.22673  923.15491    10
                c <- aggregate.Matrix(m, row.names(m))   42.26674   44.60195   53.62292   48.59943   67.40071   70.40842    10

> identical(as.vector(a),as.vector(c))
[1] TRUE

EDIT: Frank is right, rowsum is somewhat faster than any of these solutions. You would want to consider using another one of these other functions only if you were using a Matrix, especially a sparse one, or if you were performing an aggregation besides sum.

answered Oct 22 '22 11:10

Craig

Related questions
                            
                                How to round percentage to 2 decimal places in ggplot2
                            
                                Remove everything after a character, but keep the character
                            
                                time and geographical subset of netcdf raster stack or raster brick using R
                            
                                Using tidy eval for multiple dplyr filter conditions
                            
                                Show all date values on ggplot x axis - R
                            
                                How can I use Conda environments with RStudio Server?
                            
                                how to add a (multipage) pdf to rmarkdown?
                            
                                Understanding degrees of freedom in lavaan
                            
                                Find variable combinations that makes Primary Key in R
                            
                                How to use shiny javascript functions?
                            
                                data.table alternative to piping
                            
                                Extend axis limits without plotting (in order to align two plots by x-unit)
                            
                                mutate_at to replace NAs with 0
                            
                                Is it possible to change the alignment of only 1 facet title
                            
                                Transform Identity Matrix
                            
                                How do I quickly group the time column in a dataframe into intervals?
                            
                                Turning field values into column names in an R data frame
                            
                                R: Applying a function to all row-pairs of a matrix without for loop
                            
                                R -- Vignettes that are not made by Sweave possible?
                            
                                ggplot2 plot table as lines

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With