I have two data frames:
df
where for single x
there are multiple values y
matchDf
which defines constraints I would like to use to filter df
I would like to filter only rows where y
is lower or equal to max_y
given that x
falls in interval <x_from, x_to)
Is there any way to do this in R (without SQL I used below)?
df <- read.table(header = TRUE, text = '
x y
100 0.1
100 0.2
100 0.3
250 0.2
250 0.3
250 0.4
375 0.2
375 0.25
375 0.35
420 0.15
420 0.16
420 0.17
500 0.23
500 0.55')
matchDf <- read.table(header = TRUE, text = '
x_from x_to max_y
0 300 .2
300 500 .3
500 99999 .5
')
library(sqldf)
sqldf('select a.*
from
df a
join matchDf b on (a.x >= b.x_from
and a.x < b.x_to
and a.y <= b.max_y)'
)
Try this:
df[df$y <= matchDf$max_y[cut(df$x, c(0,matchDf$x_to))],]
x y
1 100 0.10
2 100 0.20
4 250 0.20
7 375 0.20
8 375 0.25
10 420 0.15
11 420 0.16
12 420 0.17
13 500 0.23
What's happening here is that cut
is giving you which row in matchDf
each observation in df
should be in. Then, you just use that as a positional row extraction vector, and state the conditional relationship for y
using <=
.
To see how cut
works, just pull it out of the expression:
> cut(df$x, c(0,matchDf$x_to))
[1] (0,300] (0,300] (0,300] (0,300] (0,300] (0,300] (300,500] (300,500] (300,500] (300,500] (300,500] (300,500] (300,500] (300,500]
Levels: (0,300] (300,500] (500,1e+05]
The level labels are irrelevant, because [
uses the underlying integer values to extract.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With