Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Row wise operation on data.table

Tags:

r

data.table

Let's say I'd like to calculate the magnitude of the range over a few columns, on a row-by-row basis.

set.seed(1)
dat <- data.frame(x=sample(1:1000,1000),
                  y=sample(1:1000,1000),
                  z=sample(1:1000,1000))

Using data.frame(), I would do something like this:

dat$diff_range <- apply(dat,1,function(x) diff(range(x)))

To put it more simply, I'm looking for this operation, over each row:

diff(range(dat[1,]) # for i 1:nrow(dat)

If I were doing this for the entire table, it would be something like:

setDT(dat)[,diff_range := apply(dat,1,function(x) diff(range(x)))]

But how would I do it for only named (or numbered) rows?

like image 587
Brandon Bertelsen Avatar asked Dec 25 '22 14:12

Brandon Bertelsen


2 Answers

pmax and pmin find the min and max across columns in a vectorized way, which is much better than splitting and working with each row separately. It's also pretty concise:

dat[, r := do.call(pmax,.SD) - do.call(pmin,.SD)]


        x   y   z   r
   1: 266 531 872 606
   2: 372 685 967 595
   3: 572 383 866 483
   4: 906 953 437 516
   5: 201 118 192  83
  ---                
 996: 768 945 292 653
 997:  61 231 965 904
 998: 771 145  18 753
 999: 841 148 839 693
1000: 857 252 218 639
like image 79
Frank Avatar answered Dec 28 '22 09:12

Frank


How about this:

D[,list(I=.I,x,y,z)][,diff(range(x,y,z)),by=I][c(1:4,15:18)]
#    I  V1
#1:  1 971
#2:  2 877
#3:  3 988
#4:  4 241
#5: 15 622
#6: 16 684
#7: 17 971
#8: 18 835

#actually this will be faster
D[c(1:4,15:18),list(I=.I,x,y,z)][,diff(range(x,y,z)),by=I]

use .I to give you an index to call with the by= parameter, then you can run the function on each row. The second call pre-filters by any list of row numbers, or you can add a key and filter on that if your real table looks different.

like image 39
Troy Avatar answered Dec 28 '22 08:12

Troy