Consider removing those elements from a vector that match a certain set if criteria. The expected behaviour is to remove those that match, and, in particular, if none match then remove none:
> d = 1:20
> d
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> d[-which(d > 10)]
[1] 1 2 3 4 5 6 7 8 9 10
> d[-which(d > 100)]
integer(0)
We see here that the final statement has both done something very unexpected and silently hidden the error without even a warning.
I initially thought that this was an undesirable (but consistent) consequence of the choice that an empty index selects all elements of a vector
http://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.html
as is commonly used to e.g. select the first column of a matrix, m, by writing
m[ , 1]
However the behaviour observed here is consistent with interpreting an empty vector as "no elements", not "all elements":
> a = integer(0)
selecting "no elements" works exactly as expected:
> v[a]
numeric(0)
however removing "no elements" does not:
> v[-a]
numeric(0)
For an empty vector to both select no elements and remove all elements requires inconsistency.
Obviously it is possible to work around this issue, either by checking that the which() returns non-zero length or using a logical expression as covered here In R, why does deleting rows or cols by empty index results in empty data ? Or, what's the 'right' way to delete?
but my two questions are:
This doesn't work because which(d > 100) and -which(d > 100) are the same object: there is no difference between an empty vector and the negative of that empty vector.
For example, imagine you did:
d = 1:10
indexer = which(d > 100)
negative_indexer = -indexer
The two variables would be the same (which is the only consistent behavior- turning all the elements of an empty vector negative leaves it the same since it has no elements).
indexer
#> integer(0)
negative_indexer
#> integer(0)
identical(indexer, negative_indexer)
#> [1] TRUE
At that point, you couldn't expect d[indexer] and d[negative_indexer] to give different results. There is also no place to provide an error or warning: it doesn't know when passed an empty vector that you "meant" the negative version of that empty vector.
The solution is that for subsetting there's no reason you need which() at all: you could use d[d > 10] instead of your original example. You could therefore use !(d > 100) or d <= 100 for your negative indexing. This behaves as you'd expect because d > 10 or !(d > 100) are logical vectors rather than vectors of indices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With