Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data.table multi column recode/sub-assign [duplicate]

Let DT be a data.table:

DT<-data.table(V1=sample(10),
               V2=sample(10),
               ...
               V9=sample(10),)

Is there a better/simpler method to do multicolumn recode/sub-assign like this:

DT[V1==1 | V1==7,V1:=NA]
DT[V2==1 | V2==7,V2:=NA]
DT[V3==1 | V3==7,V3:=NA]
DT[V4==1 | V4==7,V4:=NA]
DT[V5==1 | V5==7,V5:=NA]
DT[V6==1 | V6==7,V6:=NA]
DT[V7==1 | V7==7,V7:=NA]
DT[V8==1 | V8==7,V8:=NA]
DT[V9==1 | V9==7,V9:=NA]

Variable names are completely arbitrary and do not necessarily have numbers. Many columns (Vx:Vx) and one recode pattern for all (NAME==1 | NAME==7, NAME:=something).

And further, how to multicolumn subassign NA's to something else. E.g in data.frame style:

data[,columns][is.na(data[,columns])] <- a_value
like image 737
Tomasz Jerzyński Avatar asked Jul 30 '15 10:07

Tomasz Jerzyński


People also ask

What does .SD do in data table?

SD stands for "Subset of Data. table". The dot before SD has no significance but doesn't let it clash with a user-defined column name.

What does .SD mean in R?

The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the standard deviation.

How do you add multiple columns to a data table?

A column can be added to an existing data table using := operator. Here ':' represents the fixed values and '=' represents the assignment of values. So, they together represent the assignment of fixed values. Therefore, with the help of “:=” we will add 2 columns in the above table.

How do you add two columns to a data table in R?

To combine two columns of a data. table object, we can use paste0 function. For example, if we have a data frame defined as DT that contains two columns named as x and y then we can combine them using the below command.


1 Answers

You could use set for replacing values in multiple columns. Based on the ?set, it is fast as the overhead of [.data.table is avoided. We use a for loop to loop over the columns and replace the values that were indexed by the 'i' and 'j' with 'NA'

 for(j in seq_along(DT)) {
      set(DT, i=which(DT[[j]] %in% c(1,7)), j=j, value=NA)
  }

EDIT: Included @David Arenburg's comments.

data

set.seed(24)
DT<-data.table(V1=sample(10), V2= sample(10), V3= sample(10))
like image 137
akrun Avatar answered Oct 18 '22 13:10

akrun