How can I get the rows of a dataframe that fit between the ranges of another dataframe? For example:
import pandas as pd
df1 = pd.DataFrame({
'date': [
pd.Timestamp(2019,1,1),
pd.Timestamp(2019,1,2),
pd.Timestamp(2019,1,3),
pd.Timestamp(2019,2,1),
pd.Timestamp(2019,2,5)
]
})
df2 = pd.DataFrame({
'from_date': [pd.Timestamp(2019,1,1), pd.Timestamp(2019,2,1)],
'to_date': [pd.Timestamp(2019,1,2), pd.Timestamp(2019,2,1)]
})
Data:
> df1
date
0 2019-01-01 <- I want this
1 2019-01-02 <- and this
2 2019-01-03
3 2019-02-01 <- and this
4 2019-02-05
> df2
from_date to_date
0 2019-01-01 2019-01-02
1 2019-02-01 2019-02-01
The ranges can overlap each other. I want find all rows in df1
that fall between any of the ranges in df2
. I tried:
df1[df1['date'].between(df2['from_date'], df2['to_date'])]
But that resulted in an error:
ValueError: Can only compare identically-labeled Series objects
I am using numpy
broadcast
s2_1=df2.from_date.values
s2_2=df2.to_date.values
s1=df1.values[:,None]
df1[np.any((s1>=s2_1)&(s1<=s2_2),-1)]
Out[35]:
date
0 2019-01-01
1 2019-01-02
3 2019-02-01
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With