Efficient apply or mapply for multiple matrix arguments by row

Tags:

r

I have two matrices that I want to apply a function to, by rows:

matrixA
           GSM83009  GSM83037  GSM83002  GSM83029  GSM83041
100001_at  5.873321  5.416164  3.512227  6.064150  3.713696
100005_at  5.807870  6.810829  6.105804  6.644000  6.142413
100006_at  2.757023  4.144046  1.622930  1.831877  3.694880

matrixB
          GSM82939 GSM82940 GSM82974 GSM82975
100001_at 3.673556 2.372952 3.228049 3.555816
100005_at 6.916954 6.909533 6.928252 7.003377
100006_at 4.277985 4.856986 3.670161 4.075533

I've found several similar questions, but not a whole lot of answers: mapply for matrices, Multi matrix row-wise mapply?. The code I have now splits the matrices by row into lists, but having to split it makes it rather slow and not much faster than a for loop, considering I have almost 9000 rows in each matrix:

scores <- mapply(t.test.stat, split(matrixA, row(matrixA)), split(matrixB, row(matrixB)))

The function itself is very simple, just finding the t-value:

t.test.stat <- function(x, y)
{
    return( (mean(x) - mean(y)) / sqrt(var(x)/length(x) + var(y)/length(y)) )
}

905

asked Apr 11 '11 19:04

Edd

1 Answers

Splitting the matrices isn't the biggest contributor to evaluation time.

set.seed(21)
matrixA <- matrix(rnorm(5 * 9000), nrow = 9000)
matrixB <- matrix(rnorm(4 * 9000), nrow = 9000)

system.time( scores <- mapply(t.test.stat,
    split(matrixA, row(matrixA)), split(matrixB, row(matrixB))) )
#    user  system elapsed 
#    1.57    0.00    1.58 
smA <- split(matrixA, row(matrixA))
smB <- split(matrixB, row(matrixB))
system.time( scores <- mapply(t.test.stat, smA, smB) )
#    user  system elapsed 
#    1.14    0.00    1.14

Look at the output from Rprof to see that most of the time is--not surprisingly--spent evaluating t.test.stat (mean, var, etc.). Basically, there's quite a bit of overhead from function calls.

Rprof()
scores <- mapply(t.test.stat, smA, smB)
Rprof(NULL)
summaryRprof()

You may be able to find faster generalized solutions, but none will approach the speed of the vectorized solution below.

Since your function is simple, you can take advantage of the vectorized rowMeans function to do this almost instantaneously (though it's a bit messy):

system.time({
ncA <- NCOL(matrixA)
ncB <- NCOL(matrixB)
ans <- (rowMeans(matrixA)-rowMeans(matrixB)) /
  sqrt( rowMeans((matrixA-rowMeans(matrixA))^2)*(ncA/(ncA-1))/ncA +
        rowMeans((matrixB-rowMeans(matrixB))^2)*(ncB/(ncB-1))/ncB )
})
#    user  system elapsed 
#      0       0       0 
head(ans)
# [1]  0.8272511 -1.0965269  0.9862844 -0.6026452 -0.2477661  1.1896181

UPDATE
Here's a "cleaner" version using a rowVars function:

rowVars <- function(x, na.rm=FALSE, dims=1L) {
  rowMeans((x-rowMeans(x, na.rm, dims))^2, na.rm, dims)*(NCOL(x)/(NCOL(x)-1))
}
ans <- (rowMeans(matrixA)-rowMeans(matrixB)) /
  sqrt( rowVars(matrixA)/NCOL(matrixA) + rowVars(matrixB)/NCOL(matrixB) )

answered Nov 11 '22 21:11

Joshua Ulrich

Related questions
                            
                                How to get line breaks in equation when knitting to pdf?
                            
                                size legend for plotly bubble map/chart
                            
                                Bookdown: Fix extra space before Chinese string inside R code chunk
                            
                                How does doRedis work?
                            
                                R / Sweave formatting numbers with \Sexpr{} in scientific notation
                            
                                how to skip through a loop when debugging R code
                            
                                ggplot legend showing transparency and fill color
                            
                                Advanced error handling
                            
                                How does glmnet's standardize argument handle dummy variables?
                            
                                Shiny: download zip archive
                            
                                How to change the height of select dropdown in shiny?
                            
                                Add beamer frame options in knitr/rmarkdown
                            
                                Add Polygons to R shiny leaflet map
                            
                                Consistent color scale and legend between plots when not all levels of a grouping variable are present in the data
                            
                                Getting separate axis labels on R plotly subplots
                            
                                rattle installation error: Invalid root element:
                            
                                Shuffle Vector in R, But Identical Elements Should Have Minimum Distance
                            
                                How to summarize data by-group, by creating dummy variables as the collapsing method
                            
                                Including Script Files in an R Extension Package
                            
                                challenge: optimize unlisting [easy]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficient apply or mapply for multiple matrix arguments by row

Tags:

r

Edd

People also ask

1 Answers

Joshua Ulrich

Recent Activity

Donate For Us