Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding which element of a vector is between two values in R

Tags:

r

vector

I have two vectors x and y. I would like to find which elements of x are between the two elements of vector y. How can I do it in R?

x = c( .2, .4, 2.1, 5.3, 6.7, 10.5)
y = c( 1, 7)

I have written the following code, but it does not give me the correct result.

> x = x[ x >= y[1] && x <= y[2]]
> x
numeric(0)

Result should be like this:

res = c(2.1, 5.3, 6.7)
like image 360
rose Avatar asked Dec 23 '13 23:12

rose


2 Answers

You are looking for &, not &&:

x = c( .2, .4, 2.1, 5.3, 6.7, 10.5)
y = c( 1, 7)
x = x[ x >= y[1] & x <= y[2]]
x
# [1] 2.1 5.3 6.7

Edited to explain. Here's the text from ?'&' .

& and && indicate logical AND and | and || indicate logical OR. 
The shorter form performs elementwise comparisons in much the same way as arithmetic operators. 
The longer form evaluates left to right examining only the first element of each vector. 
Evaluation proceeds only until the result is determined. 

So when you used && , it returned FALSE for the first element of your x and terminated.

like image 79
josliber Avatar answered Nov 08 '22 09:11

josliber


There are two convenience functions for between included in the dplyr and data.table packages

between {dplyr}

This is a shortcut for x >= left & x <= right, implemented efficiently in C++ for local values, and translated to the appropriate SQL for remote tables.

between {data.table}

between is equivalent to x >= lower & x <= upper when incbounds=TRUE, or x > lower & y < upper when FALSE

To return the desired values

x[between(x, min(y), max(y))]

Another option using findInterval

x[findInterval(x,y)==1L]

There appears to be a slight (microseconds) speed advantage for findInterval using the authors original vector

Unit: microseconds

               expr    min     lq     mean  median      uq     max neval
dplyr::between      14.078 14.839 20.37472 18.6435 20.5455  60.876   100
data.table::between 58.593 61.637 73.26434 68.2950 78.3780 160.560   100
findInterval         3.805  4.566  6.52944  5.7070  6.6585  35.385   100

updated with large vector

x <- runif(1e8, 0, 10)
y <- c(1, 7)

Results show slight advantage for data.table with a large vector, but in reality they are close enough that I'd use whatever package you have loaded

Unit: seconds

              expr         min       lq     mean   median       uq      max neval
dplyr::between        1.879269 1.926350 1.969953 1.947727 1.995571 2.509277   100
data.table::between   1.064609 1.118584 1.166563 1.146663 1.202884 1.800333   100
findInterval          2.207620 2.273050 2.337737 2.334711 2.393277 2.763117   100
x>=min(y) & x<=max(y) 2.350481 2.429235 2.496715 2.486349 2.542527 2.921387   100
like image 5
manotheshark Avatar answered Nov 08 '22 07:11

manotheshark