Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using data.table to replace multiple columns on single condition

Tags:

r

data.table

I want to change the default value (which is 255) to NA.

dt <- data.table(x = c(1,5,255,0,NA), y = c(1,7,255,0,0), z = c(4,2,7,8,255))
coords <- c('x', 'y')

Which gives the following code:

     x   y   z
1:   1   1   4
2:   5   7   2
3: 255 255   7
4:   0   0   8
5:  NA   0 255

I the furthest I came up with is this:

dt[.SD == 255, (.SD) := NA, .SDcols = coords]

Please note that column z stays the same. So just the columns which are specified and not all columns.

But that doesn't help me to get the sollution:

     x   y   z
1:   1   1   4
2:   5   7   2
3:  NA  NA   7
4:   0   0   8
5:  NA   0 255

I am looking for a sustainable solution because the original dataset is a couple of million rows.

EDIT:

I have found a solution but it is quite ugly and is definately too slow as it takes almost 10 seconds to get through a dataframe of 22009 x 86. Does anyone have a better solution?

The code:

dt[, replace(.SD, .SD == 255, NA), .SDcols = coords, by = c(colnames(dt)[!colnames(dt) %in% coords])]

like image 977
Tunder250 Avatar asked Mar 06 '23 02:03

Tunder250


2 Answers

Here is how you can keep the columns outside .SDcols,

library(data.table)
dt[, (coords) := replace(.SD, .SD == 255, NA), .SDcols = coords]

which gives,

    x  y   z
1:  1  1   4
2:  5  7   2
3: NA NA   7
4:  0  0   8
5: NA  0 255
like image 56
Sotos Avatar answered May 01 '23 01:05

Sotos


You could also do:

require(data.table)
dt[ ,
    (coords) := lapply(.SD, function(x) fifelse(x == 255, NA_real_, x)),
    .SDcols = coords ]

Having compared it to Sotos' answer, it also seems a little bit faster.

like image 26
andschar Avatar answered May 01 '23 03:05

andschar