Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow dot product in R

I am trying to take the dot product from a 331x23152 and 23152x23152 matrix.

In Python and Octave this is a trivial operation, but in R this seems to be incredibly slow.

N <- 331
M <- 23152

mat_1 = matrix( rnorm(N*M,mean=0,sd=1), N, M) 
mat_2 = matrix( rnorm(N*M,mean=0,sd=1), M, M) 
tm3 <- system.time({
    mat_3 = mat_1%*%mat_2
})
print(tm3)

The output is

user  system elapsed 
101.95    0.04  101.99 

In other words, this dot product takes over 100 seconds to execute.

I am running R-3.4.0 64-bit, with RStudio v1.0.143 on a i7-4790 with 16 GB RAM. As such, I did not expect this operation to take so long.

Am I overlooking something? I have started looking into the packages bigmemory and bigalgebra, but I can't help but think there's a solution without having to resort to packages.


EDIT

To give you an idea of time difference, here's a script for Octave:

n = 331;
m = 23152;

mat_1 = rand(n,m);
mat_2 = rand(m,m);
tic
mat_3 = mat_1*mat_2;
toc

The output is

Elapsed time is 3.81038 seconds.

And in Python:

import numpy as np
import time

n = 331
m = 23152

mat_1 = np.random.random((n,m))
mat_2 = np.random.random((m,m))
tm_1 = time.time()
mat_3 = np.dot(mat_1,mat_2)
tm_2 = time.time()
tm_3 = tm_2 - tm_1
print(tm_3)

The output is

2.781277894973755

As you can see, these numbers are not even in the same ballpark.

EDIT 2

At Zheyuan Li's request, here are toy examples for dot products.

In R:

mat_1 = matrix(c(1,2,1,2,1,2), nrow = 2, ncol = 3)
mat_2 = matrix(c(1,1,1,2,2,2,3,3,3), nrow = 3, ncol = 3)
mat_3 = mat_1 %*% mat_2
print(mat_3)

The output is:

     [,1] [,2] [,3]
[1,]    3    6    9
[2,]    6   12   18

In Octave:

mat_1 = [1,1,1;2,2,2];
mat_2 = [1,2,3;1,2,3;1,2,3];
mat_3 = mat_1*mat_2

The output is:

mat_3 =

    3    6    9
    6   12   18

In Python:

import numpy as np

mat_1 = np.array([[1,1,1],[2,2,2]])
mat_2 = np.array([[1,2,3],[1,2,3],[1,2,3]])
mat_3 = np.dot(mat_1, mat_2)
print(mat_3)

The output is:

[[ 3  6  9]
 [ 6 12 18]]

For more information on matrix dot products: https://en.wikipedia.org/wiki/Matrix_multiplication

EDIT 3

The output for sessionInfo() is:

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C                       LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0 

EDIT 4

I tried the bigalgebra package but this did not seem to speed things up:

library('bigalgebra')

N <- 331
M <- 23152

mat_1 = matrix( rnorm(N*M,mean=0,sd=1), N, M)
mat_1 <- as.big.matrix(mat_1)
mat_2 = matrix( rnorm(N*M,mean=0,sd=1), M, M)
tm3 <- system.time({
    mat_3 = mat_1%*%mat_2
})
print(tm3)

The output is:

   user  system elapsed 
 101.79    0.00  101.81

EDIT 5

James suggested to alter my randomly generated matrix:

N <- 331
M <- 23152

mat_1 = matrix( runif(N*M), N, M)
mat_2 = matrix( runif(M*M), M, M)
tm3 <- system.time({
    mat_3 = mat_1%*%mat_2
})
print(tm3)

The output is:

   user  system elapsed 
 102.46    0.05  103.00 
like image 585
BdB Avatar asked Feb 05 '23 11:02

BdB


1 Answers

This is a trivial operation?? Matrix multiplication is always an expensive operation in linear algebra computations.

Actually I think it is quite fast. A matrix multiplication at this size has

2 * 23.152 * 23.152 * 0.331 = 354.8 GFLOP

With 100 seconds your performance is 3.5 GFLOPs. Note that on most machines, the performance is at most 0.8 GLOPs - 2 GFLOPs, unless you have an optimized BLAS library.

If you think implementation elsewhere is faster, check the possibility of usage of optimized BLAS, or parallel computing. R is doing this with a standard BLAS and no parallelism.


Important

From R-3.4.0, more tools are available with BLAS.

First of all, sessionInfo() now returns the full path of the linked BLAS library. Yes, this does not point to the symbolic link, but the final shared object! The other answer here just shows this: it has OpenBLAS.

The timing result (in the other answer) implies that parallel computing (via multi-threading in OpenBLAS) is in place. It is hard for me to tell the number of threads used, but looks like hyperthreading is on, as the slot for "system" is quite big!

Second, options can now set matrix multiplications methods, via matprod. Although this was introduced to deal with NA / NaN, it offers testing of performance, too!

  • "internal" is an implementation in non-optimized triple loop nest. This is written in C, and has equal performance to the standard (reference) BLAS written in F77;
  • "default", "blas" and "default.simd" mean using linked BLAS for computation, but the way for checking NA and NaN differs. If R is linked to standard BLAS, then as said, it has the same performance with "internal"; but otherwise we see significant boost. Also note that R team says that "default.simd" might be removed in future.
like image 85
Zheyuan Li Avatar answered Feb 16 '23 15:02

Zheyuan Li