I have two dataframes much larger than this, but they are in the form:
df1 = pd.DataFrame({'col1': ['a', 'b', 'b', 'c'],
'start': [1, 5, 10, 15],
'end': [4, 9, 14, 19]})
df2 = pd.DataFrame({'col1': ['a', 'b', 'b', 'c'],
'value': [2, 6, 12, 20],
'etc': [1, 2, 3, 4]})
I want to merge them based on checking two things in this order: 1) that col1 matches, 2) that value is between start and end. I was thinking something like (but the first == line doesn't work):
if df1.col1 == df2.col1:
if df1.start < df2.value < df1.end:
df1.merge(df2)
I don't know if that will check all lines in df1 against all lines in df2 though? The desired output for this example would be:
dfoutput = pd.DataFrame({'col1': ['a', 'b', 'b'],
'start': [1, 5, 10],
'end': [4, 9, 14],
'value': [2, 6, 12],
'etc': [1, 2, 3]})
You can first merge and check for value,
new_df = df1.merge(df2)
new_df.where(new_df.value.between(new_df.start, new_df.end)).dropna()
col1 start end value etc
0 a 1.0 4.0 2.0 1.0
1 b 5.0 9.0 6.0 2.0
4 b 10.0 14.0 12.0 3.0
5 c 15.0 19.0 16.0 4.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With