Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Comparing multiple columns to merge dataframes

I have two dataframes much larger than this, but they are in the form:

df1 = pd.DataFrame({'col1': ['a', 'b', 'b', 'c'],
                   'start': [1, 5, 10, 15],
                   'end': [4, 9, 14, 19]})
df2 = pd.DataFrame({'col1': ['a', 'b', 'b', 'c'],
                   'value': [2, 6, 12, 20],
                  'etc': [1, 2, 3, 4]})

I want to merge them based on checking two things in this order: 1) that col1 matches, 2) that value is between start and end. I was thinking something like (but the first == line doesn't work):

if df1.col1 == df2.col1:
    if df1.start < df2.value < df1.end:
        df1.merge(df2)

I don't know if that will check all lines in df1 against all lines in df2 though? The desired output for this example would be:

dfoutput = pd.DataFrame({'col1': ['a', 'b', 'b'],
                        'start': [1, 5, 10],
                        'end': [4, 9, 14],
                        'value': [2, 6, 12],
                        'etc': [1, 2, 3]})
like image 966
Liquidity Avatar asked Jun 13 '26 22:06

Liquidity


1 Answers

You can first merge and check for value,

new_df = df1.merge(df2)
new_df.where(new_df.value.between(new_df.start, new_df.end)).dropna()


    col1    start   end     value   etc
0   a       1.0     4.0     2.0     1.0
1   b       5.0     9.0     6.0     2.0
4   b       10.0    14.0    12.0    3.0
5   c       15.0    19.0    16.0    4.0
like image 73
Vaishali Avatar answered Jun 18 '26 01:06

Vaishali



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!