Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Fastest way to do row wise computation on multiple columns of a data frame

I have a data frame where I want to add another column that's a result of computation involving 3 other columns. The method I am using right now seems to be very slow. Is there any better method to do the same. Here is the approach I am using.

library(bitops)

GetRes<-function(A, B, C){
  tagU <- bitShiftR((A*C), 4)
  tagV <- bitShiftR(B, 2)

  x<-tagU %% 2
  y<-tagV %% 4

  res<-(2*x + y) %% 4
  return(res)
}

df <- data.frame(id=letters[1:3],val0=1:3,val1=4:6,val2=7:9)
apply(df, 1, function(x) GetRes(x[2], x[3], x[4]))

My data frame is very big and it's taking ages to get this computation done. Can someone suggest me to do it better?

Thanks.

like image 983
Rachit Agrawal Avatar asked Dec 26 '22 05:12

Rachit Agrawal


2 Answers

Try mapply

mapply(GetRes, df[,2], df[,3], df[,4])

If you let us know which package bitShiftR is from, we can test it on bigger data to see if there is any performance boost.

UPDATE
Quick benchmarking shows, mapply is twice as fast as your apply

microbenchmark(apply(df[,2:4], 1, function(x) GetRes(x[1], x[2], x[3])), mapply(GetRes, df[,2], df[,3], df[,4]))
Unit: microseconds
                                                      expr     min       lq   median      uq      max neval
 apply(df[, 2:4], 1, function(x) GetRes(x[1], x[2], x[3])) 196.985 201.6200 206.7515 216.187 1006.775   100
                 mapply(GetRes, df[, 2], df[, 3], df[, 4])  99.982 105.6105 108.7560 112.232  149.311   100
like image 100
CHP Avatar answered Apr 19 '23 23:04

CHP


Everything you're doing is already vectorized which is much faster than any other alternative you'll be offered. You can just call this...

with(df, GetRes(val0, val1, val2))

or this

GetRes(df$val0, df$val1, df$val2)

or this

GetRes(df[,2], df[,3], df[,4])
like image 32
John Avatar answered Apr 19 '23 23:04

John