Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you remove columns from a data.frame?

Tags:

dataframe

r

Not so much 'How do you...?' but more 'How do YOU...?'

If you have a file someone gives you with 200 columns, and you want to reduce it to the few ones you need for analysis, how do you go about it? Does one solution offer benefits over another?

Assuming we have a data frame with columns col1, col2 through col200. If you only wanted 1-100 and then 125-135 and 150-200, you could:

dat$col101 <- NULL dat$col102 <- NULL # etc 

or

dat <- dat[,c("col1","col2",...)] 

or

dat <- dat[,c(1:100,125:135,...)] # shortest probably but I don't like this 

or

dat <- dat[,!names(dat) %in% c("dat101","dat102",...)] 

Anything else I'm missing? I know this is sightly subjective but it's one of those nitty gritty things where you might dive in and start doing it one way and fall into a habit when there are far more efficient ways out there. Much like this question about which.

EDIT:

Or, is there an easy way to create a workable vector of column names? name(dat) doesn't print them with commas in between, which you need in the code examples above, so if you print out the names in that way you have spaces everywhere and have to manually put in commas... Is there a command that will give you "col1","col2","col3",... as your output so you can easily grab what you want?

like image 274
nzcoops Avatar asked Aug 16 '11 00:08

nzcoops


People also ask

How do you remove few columns from a data frame?

We can use Pandas drop() function to drop multiple columns from a dataframe. Pandas drop() is versatile and it can be used to drop rows of a dataframe as well. To use Pandas drop() function to drop columns, we provide the multiple columns that need to be dropped as a list.

How do I delete rows and columns in a data frame?

How to Drop a Row or Column in a Pandas Dataframe. To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default.


1 Answers

I use data.table's := operator to delete columns instantly regardless of the size of the table.

DT[, coltodelete := NULL] 

or

DT[, c("col1","col20") := NULL] 

or

DT[, (125:135) := NULL] 

or

DT[, (variableHoldingNamesOrNumbers) := NULL] 

Any solution using <- or subset will copy the whole table. data.table's := operator merely modifies the internal vector of pointers to the columns, in place. That operation is therefore (almost) instant.

like image 132
Matt Dowle Avatar answered Sep 17 '22 13:09

Matt Dowle