Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assigning a data.table slice in R

Tags:

r

data.table

In order to read a data.table slice, I can use the following syntax:

foo = DT[, 5:10, with=F]

but now I want to do:

foo = foo + 1
DT[, 5:10, with=F] = foo

This doesn't work; referring to the columns by names also doesn't seem to work. Any suggestions?

like image 915
rimorob Avatar asked Jan 13 '14 20:01

rimorob


1 Answers

It's a little more subtle. This is how I read your question and how you're trying to do it at the moment ...

Your first line creates a new data.table object with the 6-column subset of columns :

foo = DT[, 5:10, with=F]

I'm immediately thinking of the memory implications. If each column is 1GB, that's a 6GB new object you just allocated.

then you +1 to everything in that 6GB :

foo = foo + 1   # or something like that, that works

That's a copy of that 6GB to another new 6GB.

Then you copy the 6GB foo back into where it was in DT in the first place :

DT[, 5:10, with=F] = foo    # or something like that, that works

That's really memory inefficient. It's a base R way of doing things.

In data.table you can loop, and you can set. I would just do it in an easy to read and easy to understand loop.

for (col in 5:10)
    set(DT, j=col, value=DT[[col]]+1)

This changes each column by reference one-by-one. DT[[col]] doesn't copy the column contents (that's nothing special in data.table, that's base R that doesn't copy). But the +1 does create a new vector. However, that new vector is then plonked directly into the column pointer slot, so it's as efficient as it can be given that +1 returns a new object.

like image 77
Matt Dowle Avatar answered Sep 28 '22 05:09

Matt Dowle