Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R drop by empty index on vector inconsistent behaviour

Tags:

r

Consider removing those elements from a vector that match a certain set if criteria. The expected behaviour is to remove those that match, and, in particular, if none match then remove none:

> d = 1:20
> d
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> d[-which(d > 10)]
 [1]  1  2  3  4  5  6  7  8  9 10
> d[-which(d > 100)]
integer(0)

We see here that the final statement has both done something very unexpected and silently hidden the error without even a warning.

I initially thought that this was an undesirable (but consistent) consequence of the choice that an empty index selects all elements of a vector

http://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.html

as is commonly used to e.g. select the first column of a matrix, m, by writing

m[ , 1]

However the behaviour observed here is consistent with interpreting an empty vector as "no elements", not "all elements":

> a = integer(0)

selecting "no elements" works exactly as expected:

> v[a]
numeric(0)

however removing "no elements" does not:

> v[-a]
numeric(0)

For an empty vector to both select no elements and remove all elements requires inconsistency.

Obviously it is possible to work around this issue, either by checking that the which() returns non-zero length or using a logical expression as covered here In R, why does deleting rows or cols by empty index results in empty data ? Or, what's the 'right' way to delete?

but my two questions are:

  1. Why is the behaviour inconsistent?
  2. Why does it silently do the wrong thing without an error or warning?
like image 855
user2711915 Avatar asked May 03 '26 06:05

user2711915


1 Answers

This doesn't work because which(d > 100) and -which(d > 100) are the same object: there is no difference between an empty vector and the negative of that empty vector.

For example, imagine you did:

d = 1:10

indexer = which(d > 100)
negative_indexer = -indexer

The two variables would be the same (which is the only consistent behavior- turning all the elements of an empty vector negative leaves it the same since it has no elements).

indexer
#> integer(0)
negative_indexer
#> integer(0)
identical(indexer, negative_indexer)
#> [1] TRUE

At that point, you couldn't expect d[indexer] and d[negative_indexer] to give different results. There is also no place to provide an error or warning: it doesn't know when passed an empty vector that you "meant" the negative version of that empty vector.


The solution is that for subsetting there's no reason you need which() at all: you could use d[d > 10] instead of your original example. You could therefore use !(d > 100) or d <= 100 for your negative indexing. This behaves as you'd expect because d > 10 or !(d > 100) are logical vectors rather than vectors of indices.

like image 191
David Robinson Avatar answered May 05 '26 22:05

David Robinson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!