I need to drop one column from a data.frame containing a few hundred columns.
With a data.frame
, I'd use subset
to do this conveniently:
> dat <- data.table( data.frame(x=runif(10),y=rep(letters[1:5],2),z=runif(10)),key='y' )
> subset(dat,select=c(-z))
x y
1: 0.1969049 a
2: 0.7916696 a
3: 0.9095970 b
4: 0.3529506 b
5: 0.4923602 c
6: 0.5993034 c
7: 0.1559861 d
8: 0.9929333 d
9: 0.3980169 e
10: 0.1921226 e
Obviously this still works, but it seems like not a very data.table
-like idiom. I could manually construct a list of the column names I wanted to keep, which seems a little more data.table
-like:
> dat[,list(x,y)]
x y
1: 0.1969049 a
2: 0.7916696 a
3: 0.9095970 b
4: 0.3529506 b
5: 0.4923602 c
6: 0.5993034 c
7: 0.1559861 d
8: 0.9929333 d
9: 0.3980169 e
10: 0.1921226 e
But then I have to construct such a list, which is clunky.
Is subset
the proper way to conveniently drop a column or two, or does it cause a performance hit? If not, what's the better way?
Edit
Benchmarks:
> dat <- data.table( data.frame(x=runif(10^7),y=rep(letters[1:10],10^6),z=runif(10^7)),key='y' )
> microbenchmark( subset(dat,select=c(-z)), dat[,list(x,y)] )
Unit: milliseconds
expr min lq median uq max
1 dat[, list(x, y)] 102.62826 167.86793 170.72847 199.89789 792.0207
2 subset(dat, select = c(-z)) 33.26356 52.55311 53.53934 55.00347 180.8740
But really where it may matter more is for memory if subset
copies the whole data.table
.
Right-click the column you want to delete and choose Delete Column from the shortcut menu. If the column participates in a relationship (FOREIGN KEY or PRIMARY KEY), a message prompts you to confirm the deletion of the selected columns and their relationships. Choose Yes.
The SQL ALTER TABLE statement is used to add, modify, or drop/delete columns in a table.
To physically drop a column you can use one of the following syntaxes, depending on whether you wish to drop a single or multiple columns. alter table table_name drop column column_name; alter table table_name drop (column_name1, column_name2);
SQL DELETE column using SSMS object explorer In the particular database, locate the particular table and expand the columns. It shows all the columns of the particular table. Right-click on the column we want to remove and click on Delete as shown in the following image.
If you are wanting to remove the column permanently use := NULL
dat[, z := NULL]
If you have your columns to drop as a character string use ()
to force evaluation as a character string, not as the character name.
toDrop <- c('z')
dat[, (toDrop) := NULL]
If you want to limit the availability of the columns in .SD
, you can pass the .SDcols
argument
dat[,lapply(.SD, somefunction) , .SDcols = setdiff(names(dat),'z')]
However, data.table
inspects the j
arguments and only gets the columns you use any way. See FAQ 1.12
When you write X[Y,sum(foo*bar)], data.table automatically inspects the j expression to see which columns it uses.
and doesn't try and load all the data for .SD
(unless you have .SD
within your call to j
)
subset.data.table
is processing the call and eventually evaluating dat[, c('x','y'), with=FALSE]
using := NULL
should be basically instantaneous, howveer t does permanently delete the column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With