How do I invert the helper functions for dplyr::select()
(like matches()
or contains()
) so that I can select variables that do NOT contains or match a particular string?
For example, say I wanted to select all the columns in the mtcars data frame that did not have the letter "m" in them. I could imagine doing something like:
mtcars %>%
select( !matches("m") )
But that throws the error:
Error: !matches("m") must resolve to integer column positions, not a logical vector
How do I write the helper function to invert it?
Important note: one possibility is to use matches()
and write a regular expression that doesn't match, but I'm more interested in finding a way to maintain the simplicity of the helper functions but invert the selection they return, rather than solving the actual "how do I select such-and-such" problem.
The helper functions for select()
like matches()
, contains()
, starts_with()
and so on, return a vector of index values. In the example above, if we didn't want the inverse, matches("m")
would return c(1,9)
because the first and ninth column names contain "m".
With that in mind, all we have to do is make the function negative:
mtcars %>%
select( -matches("m") )
That makes matches("m")
return a vector of c(-1, -9)
which deselects those columns but leaves everything else.
Using !
, the boolean NOT
, as shown the in the original example, coerces the integer values to logical, so instead of c(1,9)
, you end up with c(FALSE, FALSE)
since both 1 and 9 coerce to TRUE
but then are inverted by the !
.
This explains the error R was throwing above--select()
wants a list of integers, corresponding to column indexes, not a vector of logical values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With