Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to divide each row of a matrix by elements of a vector in R

Tags:

r

vector

matrix

I would like to divide each row of a matrix by a fixed vector. For example

mat<-matrix(1,ncol=2,nrow=2,TRUE) dev<-c(5,10) 

Giving mat/dev divides each column by dev.

     [,1] [,2] [1,]  0.2  0.2 [2,]  0.1  0.1 

However, I would like to have this as a result, i.e. do the operation row-wise :

rbind(mat[1,]/dev, mat[2,]/dev)       [,1] [,2] [1,]  0.2  0.1 [2,]  0.2  0.1 

Is there an explicit command to get there?

like image 716
tomka Avatar asked Dec 15 '13 15:12

tomka


People also ask

How do you divide a matrix in R?

R Matrix Division To divide elements of a matrix with the corresponding elements of other matrix, use division (/) operator. The multiplication happens only between the (i,j) of first matrix and (i,j) of second matrix.

How do you divide elements in a matrix?

Description. x = A ./ B divides each element of A by the corresponding element of B . The sizes of A and B must be the same or be compatible. If the sizes of A and B are compatible, then the two arrays implicitly expand to match each other.


1 Answers

Here are a few ways in order of increasing code length:

t(t(mat) / dev)  mat / dev[col(mat)] #  @DavidArenburg & @akrun  mat %*% diag(1 / dev)  sweep(mat, 2, dev, "/")  t(apply(mat, 1, "/", dev))  plyr::aaply(mat, 1, "/", dev)  mat / rep(dev, each = nrow(mat))  mat / t(replace(t(mat), TRUE, dev))  mapply("/", as.data.frame(mat), dev)  # added later  mat / matrix(dev, nrow(mat), ncol(mat), byrow = TRUE)  # added later  do.call(rbind, lapply(as.data.frame(t(mat)), "/", dev))  mat2 <- mat; for(i in seq_len(nrow(mat2))) mat2[i, ] <- mat2[i, ] / dev 

Data Frames

All the solutions that begin with mat / also work if mat is a data frame and produce a data frame result. The same is also the case for the sweep solution and the last, i.e. mat2, solution. The mapply solutions works with data.frames but produces a matrix.

Vector

If mat is a plain vector rather than a matrix then either of these return a one column matrix

t(t(mat) / dev) mat / t(replace(t(mat), TRUE, dev)) 

and this one returns a vector:

plyr::aaply(mat, 1, "/", dev) 

The others give an error, warning or not the desired answer.

Benchmarks

The brevity and clarity of the code may be more important than speed but for purposes of completeness here are some benchmarks using 10 repetitions and then 100 repetitions.

library(microbenchmark) library(plyr)  set.seed(84789)  mat<-matrix(runif(1e6),nrow=1e5) dev<-runif(10)  microbenchmark(times=10L,   "1" = t(t(mat) / dev),   "2" = mat %*% diag(1/dev),   "3" = sweep(mat, 2, dev, "/"),   "4" = t(apply(mat, 1, "/", dev)),   "5" = mat / rep(dev, each = nrow(mat)),   "6" = mat / t(replace(t(mat), TRUE, dev)),   "7" = aaply(mat, 1, "/", dev),   "8" = do.call(rbind, lapply(as.data.frame(t(mat)), "/", dev)),   "9" = {mat2 <- mat; for(i in seq_len(nrow(mat2))) mat2[i, ] <- mat2[i, ] / dev},  "10" = mat/dev[col(mat)]) 

giving:

Unit: milliseconds  expr         min          lq       mean      median          uq        max neval     1    7.957253    8.136799   44.13317    8.370418    8.597972  366.24246    10     2    4.678240    4.693771   10.11320    4.708153    4.720309   58.79537    10     3   15.594488   15.691104   16.38740   15.843637   16.559956   19.98246    10     4   96.616547  104.743737  124.94650  117.272493  134.852009  177.96882    10     5   17.631848   17.654821   18.98646   18.295586   20.120382   21.30338    10     6   19.097557   19.365944   27.78814   20.126037   43.322090   48.76881    10     7 8279.428898 8496.131747 8631.02530 8644.798642 8741.748155 9194.66980    10     8  509.528218  524.251103  570.81573  545.627522  568.929481  821.17562    10     9  161.240680  177.282664  188.30452  186.235811  193.250346  242.45495    10    10    7.713448    7.815545   11.86550    7.965811    8.807754   45.87518    10 

Re-running the test on all those that took <20 milliseconds with 100 repetitions:

microbenchmark(times=100L,   "1" = t(t(mat) / dev),   "2" = mat %*% diag(1/dev),   "3" = sweep(mat, 2, dev, "/"),   "5" = mat / rep(dev, each = nrow(mat)),   "6" = mat / t(replace(t(mat), TRUE, dev)),  "10" = mat/dev[col(mat)]) 

giving:

Unit: milliseconds  expr       min        lq      mean    median        uq       max neval     1  8.010749  8.188459 13.972445  8.560578 10.197650 299.80328   100     2  4.672902  4.734321  5.802965  4.769501  4.985402  20.89999   100     3 15.224121 15.428518 18.707554 15.836116 17.064866  42.54882   100     5 17.625347 17.678850 21.464804 17.847698 18.209404 303.27342   100     6 19.158946 19.361413 22.907115 19.772479 21.142961  38.77585   100    10  7.754911  7.939305  9.971388  8.010871  8.324860  25.65829   100 

So on both these tests #2 (using diag) is fastest. The reason may lie in its almost direct appeal to the BLAS, whereas #1 relies on the costlier t.

like image 173
G. Grothendieck Avatar answered Sep 23 '22 04:09

G. Grothendieck