Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Idiom for dropping a single column in a data.table

Tags:

r

data.table

I need to drop one column from a data.frame containing a few hundred columns.

With a data.frame, I'd use subset to do this conveniently:

> dat <- data.table( data.frame(x=runif(10),y=rep(letters[1:5],2),z=runif(10)),key='y' )
> subset(dat,select=c(-z))
            x y
 1: 0.1969049 a
 2: 0.7916696 a
 3: 0.9095970 b
 4: 0.3529506 b
 5: 0.4923602 c
 6: 0.5993034 c
 7: 0.1559861 d
 8: 0.9929333 d
 9: 0.3980169 e
10: 0.1921226 e

Obviously this still works, but it seems like not a very data.table-like idiom. I could manually construct a list of the column names I wanted to keep, which seems a little more data.table-like:

> dat[,list(x,y)]
            x y
 1: 0.1969049 a
 2: 0.7916696 a
 3: 0.9095970 b
 4: 0.3529506 b
 5: 0.4923602 c
 6: 0.5993034 c
 7: 0.1559861 d
 8: 0.9929333 d
 9: 0.3980169 e
10: 0.1921226 e

But then I have to construct such a list, which is clunky.

Is subset the proper way to conveniently drop a column or two, or does it cause a performance hit? If not, what's the better way?

Edit

Benchmarks:

> dat <- data.table( data.frame(x=runif(10^7),y=rep(letters[1:10],10^6),z=runif(10^7)),key='y' )
> microbenchmark( subset(dat,select=c(-z)), dat[,list(x,y)] )
Unit: milliseconds
                         expr       min        lq    median        uq      max
1           dat[, list(x, y)] 102.62826 167.86793 170.72847 199.89789 792.0207
2 subset(dat, select = c(-z))  33.26356  52.55311  53.53934  55.00347 180.8740

But really where it may matter more is for memory if subset copies the whole data.table.

like image 858
Ari B. Friedman Avatar asked May 10 '13 00:05

Ari B. Friedman


People also ask

How do I drop one column in a table?

Right-click the column you want to delete and choose Delete Column from the shortcut menu. If the column participates in a relationship (FOREIGN KEY or PRIMARY KEY), a message prompts you to confirm the deletion of the selected columns and their relationships. Choose Yes.

What statement is used to drop a column from a table?

The SQL ALTER TABLE statement is used to add, modify, or drop/delete columns in a table.

Which syntax is true for dropping of one column in a table?

To physically drop a column you can use one of the following syntaxes, depending on whether you wish to drop a single or multiple columns. alter table table_name drop column column_name; alter table table_name drop (column_name1, column_name2);

How can a column be dropped from a table structure?

SQL DELETE column using SSMS object explorer In the particular database, locate the particular table and expand the columns. It shows all the columns of the particular table. Right-click on the column we want to remove and click on Delete as shown in the following image.


1 Answers

If you are wanting to remove the column permanently use := NULL

dat[, z := NULL]

If you have your columns to drop as a character string use () to force evaluation as a character string, not as the character name.

toDrop <- c('z')

dat[, (toDrop) := NULL]

If you want to limit the availability of the columns in .SD, you can pass the .SDcols argument

dat[,lapply(.SD, somefunction) , .SDcols = setdiff(names(dat),'z')]

However, data.table inspects the j arguments and only gets the columns you use any way. See FAQ 1.12

When you write X[Y,sum(foo*bar)], data.table automatically inspects the j expression to see which columns it uses.

and doesn't try and load all the data for .SD (unless you have .SD within your call to j)


subset.data.table is processing the call and eventually evaluating dat[, c('x','y'), with=FALSE]

using := NULL should be basically instantaneous, howveer t does permanently delete the column.

like image 190
mnel Avatar answered Sep 30 '22 17:09

mnel