merge two dataframes with some common columns where the combining of the common needs to be a custom function

Tags:

my question is very similar to Merge pandas dataframe, with column operation but it doesn't answer my needs.

Let's say I have two dataframes such as (note that the dataframe content could be float numbers instead of booleans):

left = pd.DataFrame({0: [True, True, False], 0.5: [False, True, True]}, index=[12.5, 14, 15.5])
right = pd.DataFrame({0.7: [True, False, False], 0.5: [True, False, True]}, index=[12.5, 14, 15.5])

right

        0.5    0.7
12.5   True   True
14.0  False  False
15.5   True  False

left

        0.0    0.5
12.5   True  False
14.0   True   True
15.5  False   True

As you can see they have the same indexes and one of the column is common. In real life there might be more common columns such as one more at 1.0 or other numbers not yet defined, and more unique columns on each side. I need to combine the two dataframes such that all unique columns are kept and the common columns are combined using a specific function e.g. a boolean OR for this example, while the indexes are always identical for both dataframes.

So the result should be:

result

        0.0   0.5    0.7
12.5   True  True   True
14.0   True  True  False
15.5  False  True  False

In real life there will be more than two dataframes that need to be combined, but they can be combined sequentially one after the other to an empty first dataframe.

I feel pandas.combine might do the trick but I can't figure it out from the documentation. Anybody would have a suggestion on how to do it in one or more steps.

451

asked Oct 08 '18 14:10

Toady

1 Answers

You can concatenate the dataframes, and then groupby the column names to apply an operation on the similarly named columns: In this case you can get away with taking the sum and then typecasting back to bool to get the or operation.

import pandas as pd

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).sum().astype(bool)

Output:

        0.0   0.5    0.7
12.5   True  True   True
14.0   True  True  False
15.5  False  True  False

If you need to see how to do this in a less case-specific manner, then again just group by the columns and apply something to the grouped object over axis=1

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: x.any(1))
#        0.0   0.5    0.7
#12.5   True  True   True
#14.0   True  True  False
#15.5  False  True  False

Further, you can define a custom combining function. Here's one which adds twice the left Frame to 4 times the right Frame. If there is only one column, it returns 2x the left frame.

Sample Data

left:

      0.0  0.5
12.5    1   11
14.0    2   17
15.5    3   17

right:

      0.7  0.5
12.5    4    2
14.0    4   -1
15.5    5    5

Code

def my_func(x):
    try:
        res = x.iloc[:, 0]*2 + x.iloc[:, 1]*4
    except IndexError:
        res = x.iloc[:, 0]*2
    return res

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: my_func(x))

Output:

      0.0  0.5  0.7
12.5    2   30    8
14.0    4   30    8
15.5    6   54   10

Finally, if you wanted to do this in a consecutive manner, then you should make use of reduce. Here I'll combine 5 DataFrames with the above function. (I'll just repeat the right Frame 4x for the example)

from functools import reduce

def my_comb(df_l, df_r, func):
    """ Concatenate df_l and df_r along axis=1. Apply the
    specified function.
    """
    df = pd.concat([df_l, df_r], 1)
    return df.groupby(df.columns, 1).apply(lambda x: func(x))

reduce(lambda dfl, dfr: my_comb(dfl, dfr, func=my_func), [left, right, right, right, right])
#      0.0  0.5  0.7
#12.5   16  296  176
#14.0   32  212  176
#15.5   48  572  220

178

answered Nov 28 '22 09:11

ALollz

Related questions
                            
                                Transform output file does not exist
                            
                                Rounding value half-up to 2 decimal places in Go
                            
                                Problem setting up elm with react-elm-components
                            
                                Why content size is different if we have more columns in CSS grid?
                            
                                Uncaught Failed to get parent origin from URL hash
                            
                                docker-compose file has become too long [closed]
                            
                                Systemd watchdog running program via a script
                            
                                Converting docker tar file into a singularity image?
                            
                                Visual Studio Code - Light bulb not appearing
                            
                                Access loop index in the current elements data attribute - VueJS
                            
                                How to make VS Code recognize the ES7 bind operator
                            
                                Failed to connect to http://localhost:8086, Please check your connection settings and ensure 'influxd' is running

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

merge two dataframes with some common columns where the combining of the common needs to be a custom function

Tags:

right

left

result

Toady

People also ask

1 Answers

Output:

Sample Data

Code

Output:

ALollz

Recent Activity

Donate For Us