Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table replacing a value by NA [duplicate]

Tags:

r

data.table

I want to replace 0 by NA in each column of a data.table.

library(data.table)
dt1 <- data.table(V1=0:2, V2=2:0)
dt1

   V1 V2
1:  0  2
2:  1  1
3:  2  0

dt1==0
       V1    V2
[1,]  TRUE FALSE
[2,] FALSE FALSE
[3,] FALSE  TRUE

I tried this one

dt1[dt1==0] 
Error in `[.data.table`(dt1, dt1 == 0) : 
  i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). Please let datatable-help know if you'd like this, or add your comments to FR #1611.

And also tried this one

dt1[dt1==0, .SD :=NA] 

Edited

Partial sessionInfo()

R version 3.2.1 (2015-06-18)
Platform: i686-pc-linux-gnu (32-bit)
Running under: Ubuntu 14.04.2 LTS

data.table_1.9.4
like image 744
MYaseen208 Avatar asked Aug 01 '15 12:08

MYaseen208


People also ask

How to replace 0 with NA in one column in R?

Using R replace() function to update 0 with NA R has a built-in function called replace() that replaces values in a vector with another value, for example, zeros with NAs.

How to replace a specific Value in R?

To replace a column value in R use square bracket notation df[] , By using this you can update values on a single column or on all columns. To refer to a single column use df$column_name .

What is Data table in R?

data.table is an R package that provides an enhanced version of data.frame s, which are the standard data structure for storing data in base R. In the Data section above, we already created a data.table using fread() . We can also create one using the data.table() function.


2 Answers

You can try set for multiple columns. It will be faster as the overhead of .[data.table is avoided

for(j in seq_along(dt1)){
         set(dt1, i=which(dt1[[j]]==0), j=j, value=NA)
}
dt1
#   V1 V2
#1: NA  2
#2:  1  1
#3:  2 NA

Or another option would be looping with lapply and then change the 0 values to NA with replace

dt1[, lapply(.SD, function(x) replace(x, which(x==0), NA))]

Or we can make use of some arthithmetic operations to convert the 0 value to NA.

 dt1[, lapply(.SD, function(x) (NA^!x) *x)]

The way (NA^!x)*x this works is by converting the !x i.e. a logical TRUE/FALSE vector for each column (where TRUE corresponds to 0 value) to NA and 1 by doing NA^!x. We multiply with the x value to replace the 1 with the x value corresponding to it while the NA will remain as such.

Or a syntax similar to base R would be

  is.na(dt1) <- dt1==0

But this method may not be that efficient for large data.table as dt1==0 would be a logical matrix and also as @Roland mentioned in the comments that the dataset would be copied. I would either use the lapply based or the more efficient set for larger datasets.

like image 87
akrun Avatar answered Nov 01 '22 23:11

akrun


dt1[dt1==0] <- NA worked for me.

dt1[dt1==0] <- NA
dt1
##   V1 V2
##1: NA  2
##2:  1  1
##3:  2 NA

As noted by Roland, this does make a copy of the data.table object, and will be slower.

like image 3
alexforrence Avatar answered Nov 02 '22 00:11

alexforrence