I fear greatly that this has been asked and will be downvoted, but I have not found the answer in the docs (?"["), and discovered that it is hard to search for.
data(wines)
# This is allowed:
alcoholic <- wines[, 1]
alcoholic <- wines[, "alcohol"]
nonalcoholic <- wines[, -1]
# But this is not:
fail <- wines[, -"alcohol"]
I know of two solutions, but am frustrated for need of them.
win <- wines[, !colnames(wines) %in% "alcohol"] # snappy
win <- wines[, -which(colnames(wines) %in% "alcohol")] # snappier!
Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.
To remove a single column or multiple columns in R DataFrame use square bracket notation [] or use functions from third-party packages like dplyr.
There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.
We reference a data frame column with the double square bracket "[[]]" operator. For example, to retrieve the ninth column vector of the built-in data set mtcars, we write mtcars[[9]].
When you do
wines[, -1]
-1
is evaluated before it is used by [
. As you know, the -
unary operator won't work with object of class character
, so doing the same with "alcohol" will lead you to:
Error in -"alcohol" : invalid argument to unary operator
You can add the following to your alternatives:
wines[, -match("alcohol", colnames(wines))]
wines[, setdiff(colnames(wines), "alcohol")]
but you should know about the risks of negative indexing, e.g., see what happens if you mistype "alcool" (sic.) So your first suggestion and the last one here (@Ananda's) should be preferred. You might also want to write a function that will error out if you provide a name that is not part of your data.
Another possibility:
subset(wines,select=-alcohol)
You can even do
subset(wines,select=-c(alcohol,other_drop))
In fact, if you have a contiguous set of columns you want to drop, you can even
subset(wines,select=-(first_drop:last_drop))
which can be handy (although IMO it depends dangerously on the order of columns, which is something that might be fragile: I might prefer a grep
-based solution if there were some way to identify columns, or a more explicit separate definition of column groups).
In this case subset
is using non-standard evaluation, which as has been discussed elsewhere can be dangerous in some contexts. But I still like it for simple, top-level data manipulation because of its readability.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With