Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove multiple columns from data.table

Tags:

r

data.table

What's the correct way to remove multiple columns from a data.table? I'm currently using the code below, but was getting unexpected behavior when I accidentally repeated one of the column names. I wasn't sure if this was a bug, or if I shouldn't be removing columns this way.

library(data.table) DT <- data.table(x = letters, y = letters, z = letters) DT[ ,c("x","y") := NULL] names(DT) [1] "z" 

The above works fine, but

DT <- data.table(x = letters, y = letters, z = letters) DT[ ,c("x","x") := NULL] names(DT) [1] "z" 
like image 602
matt_k Avatar asked May 19 '13 19:05

matt_k


People also ask

How do I remove multiple columns in R?

We can delete multiple columns in the R dataframe by assigning null values through the list() function.


1 Answers

This looks like a solid, reproducible bug. It's been filed as Bug #2791.

It appears that repeating the column attempts to delete the subsequent columns.
If no columns remain, then R crashes.


UPDATE : Now fixed in v1.8.11. From NEWS :

Assigning to the same column twice in the same query is now an error rather than a crash in some circumstances; e.g., DT[,c("B","B"):=NULL] (delete by reference the same column twice). Thanks to Ricardo (#2751) and matt_k (#2791) for reporting. Tests added.

like image 148
Ricardo Saporta Avatar answered Oct 05 '22 22:10

Ricardo Saporta