I have a data.table, and want to exclude some set of columns. For example,
library(data.table)
dt <- data.table(a = 1:2, b = 2:3, c = 3:4, d = 4:5)
dt[ , .(b, c)]
Gives me the second and third column, b and c. How do I instead EXCLUDE columns b and c. Coming from the data.frame world, I would expect something like the following:
dt[ , -.(b, c)]
or, maybe
dt[ , !.(b, c)]
But neithr of these work. I know I can use
dt[ , -c(2:3), with = FALSE]
but this just (as I understand it) casts the data.table as a data.frame and then uses the standard operations. I would like to avoid this, since it is a) kind of cheating, an b) gives up the speed boosts available in data.table. I reviewed the data.table FAQ, and the vignette, and cannot seem to find anything.
(I know this is all very simplistic, and I could just select the other two columns. However, this is a microcosm of a much, MUCH bigger data.table I am working with.)
Using the "Excel" actions, write the data table to a Microsoft Excel worksheet ("Write to Excel Worksheet" action). Use the "Delete Row/Column from Excel Worksheet" action to delete the desired column. Store the result into another data table variable with the "Read from Excel Worksheet" action.
The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
To add an Exclude Columns transformation to data that is already loaded into Spotfire: Select Data > Transform data.... In the Data table drop-down list, select the data table you want to add the transformation to. Select Exclude columns from the drop-down list and click Add....
Also, in case you would not wish to change the data.table, but merely return the columns except some columns, you can do:
dt[,.SD, .SDcols = !c('b', 'c')]
which returns the required result of:
a d
1: 1 4
2: 2 5
while dt remains unchanged:
> dt
a b c d
1: 1 2 3 4
2: 2 3 4 5
We can use setdiff
dt[, setdiff(names(dt), c("b", "c")), with = FALSE]
or we can assign to NULL
(as in the other answer) but in a single step
dt[, c("b", "c") := NULL][]
You can do:
dt[ , b := NULL][ , c := NULL]
or you can use a list of columns to be removed:
xx <- c("b","c") # vector of columns you DON'T want
# subset
dt <- dt[, !xx, with = FALSE]
you can always just do:
dt[ , -c("b", "c")]
although this uses the data.fame sintax and as the problems you describe, particularly it seems to be much slower on large data sets.
Another way using set
:
set(dt,, c("b", "c"), NULL)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With