Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing multiple columns from R data.table with parameter for columns to remove

Tags:

r

data.table

I'm trying to manipulate a number of data.tables in similar ways, and would like to write a function to accomplish this. I would like to pass in a parameter containing a list of columns that would have the operations performed. This works fine when the vector declaration of columns is the left hand side of the := operator, but not if it is declared earlier (or passed into the function). The follow code shows the issue.

dt = data.table(a = letters, b = 1:2, c=1:13) colsToDelete = c('b', 'c') dt[,colsToDelete := NULL] # doesn't work but I don't understand why not. dt[,c('b', 'c') := NULL] # works fine, but doesn't allow passing in of columns 

The error is "Adding new column 'colsToDelete' then assigning NULL (deleting it)." So clearly, it's interpreting 'colsToDelete' as a new column name.

The same issue occurs when doing something along these lines

dt[, colNames := lapply(.SD, adjustValue, y=factor), .SDcols = colNames] 

I new to R, but rather more experienced with some other languages, so this may be a silly question.

like image 418
user3704757 Avatar asked Jul 05 '14 20:07

user3704757


People also ask

How do I remove multiple columns from a data set in R?

We can delete multiple columns in the R dataframe by assigning null values through the list() function.

How do I remove certain columns in R?

The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.

How do I remove certain variables in R?

Using rm() command: When you want to clear a single variable from the R environment you can use the “rm()” command followed by the variable you want to remove. variable: that variable name you want to remove.


1 Answers

It's basically because we allow symbols on LHS of := to add new columns, for convenience: ex: DT[, col := val]. So, in order to distinguish col itself being the name from whatever is stored in col being the column names, we check if the LHS is a name or an expression.

If it's a name, it adds the column with the name as such on the LHS, and if expression, then it gets evaluated.

DT[, col := val] # col is the column name.  DT[, (col) := val]  # col gets evaluated and replaced with its value DT[, c(col) := val] # same as above 

The preferred idiom is: dt[, (colsToDelete) := NULL]

HTH

like image 166
Arun Avatar answered Oct 06 '22 13:10

Arun