Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is [- subsetting (i.e. deletion) of columns not possible with names?

I fear greatly that this has been asked and will be downvoted, but I have not found the answer in the docs (?"["), and discovered that it is hard to search for.

data(wines)
# This is allowed:
alcoholic <- wines[, 1]
alcoholic <- wines[, "alcohol"]
nonalcoholic <- wines[, -1]
# But this is not:
fail <- wines[, -"alcohol"]

I know of two solutions, but am frustrated for need of them.

win <- wines[, !colnames(wines) %in% "alcohol"]  # snappy
win <- wines[, -which(colnames(wines) %in% "alcohol")]  # snappier!
like image 498
a different ben Avatar asked Sep 05 '13 10:09

a different ben


People also ask

How does subsetting work in R?

Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.

How do I remove a specific column in R?

To remove a single column or multiple columns in R DataFrame use square bracket notation [] or use functions from third-party packages like dplyr.

What are the three subsetting operators in R?

There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.

How do you reference a column name in R?

We reference a data frame column with the double square bracket "[[]]" operator. For example, to retrieve the ninth column vector of the built-in data set mtcars, we write mtcars[[9]].


2 Answers

When you do

wines[, -1]

-1 is evaluated before it is used by [. As you know, the - unary operator won't work with object of class character, so doing the same with "alcohol" will lead you to:

Error in -"alcohol" : invalid argument to unary operator

You can add the following to your alternatives:

wines[, -match("alcohol", colnames(wines))]
wines[, setdiff(colnames(wines), "alcohol")]

but you should know about the risks of negative indexing, e.g., see what happens if you mistype "alcool" (sic.) So your first suggestion and the last one here (@Ananda's) should be preferred. You might also want to write a function that will error out if you provide a name that is not part of your data.

like image 79
flodel Avatar answered Oct 15 '22 21:10

flodel


Another possibility:

subset(wines,select=-alcohol)

You can even do

subset(wines,select=-c(alcohol,other_drop))

In fact, if you have a contiguous set of columns you want to drop, you can even

subset(wines,select=-(first_drop:last_drop))

which can be handy (although IMO it depends dangerously on the order of columns, which is something that might be fragile: I might prefer a grep-based solution if there were some way to identify columns, or a more explicit separate definition of column groups).

In this case subset is using non-standard evaluation, which as has been discussed elsewhere can be dangerous in some contexts. But I still like it for simple, top-level data manipulation because of its readability.

like image 20
Ben Bolker Avatar answered Oct 15 '22 20:10

Ben Bolker