Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rowwise matrix multiplication in R

Tags:

r

I have a matrix with the dimension of 100 million records and 100 columns.

Now I want to multiply that matrix by rowwise.

My sample code for matrix multiplication is

df<-as.matrix(mtcars)
result<-apply(df,1,prod)

The above syntax is very slow in my case.

I tried rowprods function in Rfast package.

result<-rowprods(mtcars)

But the above function giving me space issues.

NOTE: I have 8 GB ram in my system.

like image 992
RSK Avatar asked Feb 20 '18 07:02

RSK


People also ask

How do you multiply matrices in R?

To multiply two matrices by elements in R, we would need to use one of the matrices as vector. For example, if we have two matrices defined by names M1 and M2 then the multiplication of these matrices by elements can be done by using M1*as. vector(M2).

How do you multiply a matrix by a vector in R?

we can use sweep() method to multiply vectors to a matrix. sweep() function is used to apply the operation “+ or – or '*' or '/' ” to the row or column in the given matrix.

How do you multiply a matrix by a scalar in R?

The multiplication operator * is used for multiplying a matrix by scalar or element-wise multiplication of two matrices. If you multiply a matrix with a scalar value, then every element of the matrix will be multiplied with that scalar. Example: Python3.

How do you multiply a Dataframe by a vector in R?

First of all, create a data frame. Then, create a vector. After that, use t function for transpose and multiplication sign * to multiply vector values in sequence with data frame columns.


2 Answers

If you have a matrix that is too large to fit in memory, you can use package bigstatsr (disclaimer: I'm the author) to use data stored on your disk (instead of the RAM). Using function big_apply enables you to apply standard R functions on data blocks (and to combine them).

library(bigstatsr)
fbm <- FBM(10e6, 100)
# inialize with random numbers
system.time(
  big_apply(fbm, a.FUN = function(X, ind) {
    print(min(ind))
    X[, ind] <- rnorm(nrow(X) * length(ind))
    NULL
  }, a.combine = 'c')
) # 78 sec

# compute row prods, possibly in parallel
system.time(
  prods <- big_apply(fbm, a.FUN = function(X, ind) {
    print(min(ind))
    matrixStats::rowProds(X[ind, ])
  }, a.combine = 'c', ind = rows_along(fbm),
  block.size = 100e3, ncores = nb_cores())  
) # 22 sec with 1 core and 18 sec with 6 cores
like image 113
F. Privé Avatar answered Oct 05 '22 06:10

F. Privé


Try package data.table with Reduce. That might avoid internal copies of a 1e10 length vector.

library(data.table)
df <- data.table(df, keep.rownames=TRUE)
df[, rowprods:= Reduce("*", .SD), .SDcols = -1]
df[, .(rn, rowprods)]
#                     rn   rowprods
# 1:           Mazda RX4          0
# 2:       Mazda RX4 Wag          0
# 3:          Datsun 710  609055152
# 4:      Hornet 4 Drive          0
# 5:   Hornet Sportabout          0
# 6:             Valiant          0
# 7:          Duster 360          0
# 8:           Merc 240D          0
# 9:            Merc 230          0
#10:            Merc 280          0
#11:           Merc 280C          0
#12:          Merc 450SE          0
#13:          Merc 450SL          0
#14:         Merc 450SLC          0
#15:  Cadillac Fleetwood          0
#16: Lincoln Continental          0
#17:   Chrysler Imperial          0
#18:            Fiat 128  470578906
#19:         Honda Civic  564655046
#20:      Toyota Corolla  386281789
#21:       Toyota Corona          0
#22:    Dodge Challenger          0
#23:         AMC Javelin          0
#24:          Camaro Z28          0
#25:    Pontiac Firebird          0
#26:           Fiat X1-9  339825992
#27:       Porsche 914-2          0
#28:        Lotus Europa 1259677924
#29:      Ford Pantera L          0
#30:        Ferrari Dino          0
#31:       Maserati Bora          0
#32:          Volvo 142E 1919442833
#                     rn    rowsums

However, 8 GB RAM (minus what your OS and other software needs) is not much if you want to work with data of this size. R sometimes needs to make internal copies to use your data.

like image 42
Roland Avatar answered Oct 05 '22 06:10

Roland