Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R dplyr: Drop multiple columns

Tags:

r

dplyr

I have a dataframe and list of columns in that dataframe that I'd like to drop. Let's use the iris dataset as an example. I'd like to drop Sepal.Length and Sepal.Width and use only the remaining columns. How do I do this using select or select_ from the dplyr package?

Here's what I've tried so far:

drop.cols <- c('Sepal.Length', 'Sepal.Width')
iris %>% select(-drop.cols)

Error in -drop.cols : invalid argument to unary operator

iris %>% select_(.dots = -drop.cols)

Error in -drop.cols : invalid argument to unary operator

iris %>% select(!drop.cols)

Error in !drop.cols : invalid argument type

iris %>% select_(.dots = !drop.cols)

Error in !drop.cols : invalid argument type

I feel like I'm missing something obvious because these seems like a pretty useful operation that should already exist. On Github, someone posted a similar issue, and Hadley said to use 'negative indexing'. That's what (I think) I've tried, but to no avail. Any suggestions?

like image 516
Navaneethan Santhanam Avatar asked Mar 07 '16 08:03

Navaneethan Santhanam


People also ask

How do I drop multiple columns in dplyr?

Use dplyr to Drop Multiple Columns Using a Function in R As usual, to drop columns, we use the ! operator. In the example, we use a simple custom function to select all columns with more than 10. The code drops these and returns the remaining columns.

How do I remove multiple columns in R?

We can delete multiple columns in the R dataframe by assigning null values through the list() function.

How do I remove columns from dplyr in R?

How do I Delete a Column in Dplyr. Deleting a column using dplyr is very easy using the select() function and the - sign. For example, if you want to remove the columns “X” and “Y” you'd do like this: select(Your_Dataframe, -c(X, Y)) .

How do I drop a column using dplyr?

In order to drop the column which ends with certain label we will be using select() function along with ends_with() function by passing the column label inside the ends_with() function as shown below. Dropping the column name which ends with “cyl” is accomplished using ends_with() function and select() function.


3 Answers

Check the help on select_vars. That gives you some extra ideas on how to work with this.

In your case:

iris %>% select(-one_of(drop.cols))
like image 200
phiver Avatar answered Oct 24 '22 00:10

phiver


also try

## Notice the lack of quotes
iris %>% select (-c(Sepal.Length, Sepal.Width))
like image 26
Miguel Rayon Gonzalez Avatar answered Oct 24 '22 00:10

Miguel Rayon Gonzalez


Beyond select(-one_of(drop.cols)) there are a couple other options for dropping columns using select() that do not involve defining all the specific column names (using the dplyr starwars sample data for some more variety in column names):

starwars %>% 
  select(-(name:mass)) %>%        # the range of columns from 'name' to 'mass'
  select(-contains('color')) %>%  # any column name that contains 'color'
  select(-starts_with('bi')) %>%  # any column name that starts with 'bi'
  select(-ends_with('er')) %>%    # any column name that ends with 'er'
  select(-matches('^f.+s$')) %>%  # any column name matching the regex pattern
  select_if(~!is.list(.)) %>%     # not by column name but by data type
  head(2)

# A tibble: 2 x 2
homeworld species
  <chr>     <chr>  
1 Tatooine  Human  
2 Tatooine  Droid 
like image 27
sbha Avatar answered Oct 24 '22 01:10

sbha