Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging pandas dataframes on 2 columns but in either order

The problem:

I've got a situation with 2 dataframes:

test1 = pd.DataFrame({'id_A':['Ben', 'Julie', 'Jack', 'Jack'],
                  'id_B':['Julie', 'Ben', 'Nina', 'Julie']})

test2 = pd.DataFrame({'id_a':['Ben', 'Ben', 'Ben', 'Julie', 'Julie', 'Nina'],
                      'id_b':['Julie', 'Nina', 'Jack', 'Nina', 'Jack', 'Jack'],
                      'value':[1,1,0,0,1,0]})

>>> test1
    id_A   id_B
0    Ben  Julie
1  Julie    Ben
2   Jack   Nina
3   Jack  Julie

>>> test2
    id_a   id_b  value
0    Ben  Julie      1
1    Ben   Nina      1
2    Ben   Jack      0
3  Julie   Nina      0
4  Julie   Jack      1
5   Nina   Jack      0

What I'd like to do is merge test2 with test1 where id_A == id_a and id_B == id_b OR where id_A == id_b and id_B == id_a, resulting in the following dataframe:

>>> final_df
    id_A   id_B  value
0    Ben  Julie      1
1  Julie    Ben      1
2   Jack   Nina      0
3   Jack  Julie      1

Current Solution:

My solution works but seems messy, and I'd like to see if I'm overlooking some more clever way to do things. It involves concatenating test2 with itself, but reversing the 2 columns of interest (id_a becomes id_b and vice-versa), and then merging from there.

test3 = pd.concat([test2, test2.rename(columns = {'id_a':'id_b', 'id_b':'id_a'})])

final_df = (test1.merge(test3, left_on = ['id_A', 'id_B'],
                        right_on = ['id_a', 'id_b'])
            .drop(['id_a', 'id_b'], axis=1))

Question:

Does anyone know a neater way to do this? I feel like I'm probably overlooking some amazing pandorable way of doing things.

Thanks for your help!

like image 663
sacuL Avatar asked May 22 '18 20:05

sacuL


People also ask

Does order matter for pandas merge?

Answer. Yes. Order of the merged dataframes will effect the order of the rows and columns of the merged dataframe. When using the merge() method, it will preserve the order of the left keys.

Can you merge on 2 columns pandas?

To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.

Which are the 3 main ways of combining DataFrames together?

Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame. to_csv can be used to write out DataFrames in CSV format.


1 Answers

With frozenset

test1.assign(
    value=test1.apply(frozenset, 1).map({frozenset(a): b for *a, b in test2.values}))

    id_A   id_B  value
0    Ben  Julie      1
1  Julie    Ben      1
2   Jack   Nina      0
3   Jack  Julie      1

Less cute, more robust. Remove what you need to afterwards.

t1 = test1.assign(ref=list(map(frozenset, zip(test1.id_A, test1.id_B))))
t2 = test2.assign(ref=list(map(frozenset, zip(test2.id_a, test2.id_b))))

t1.merge(t2, on='ref')

    id_A   id_B            ref   id_a   id_b  value
0    Ben  Julie   (Julie, Ben)    Ben  Julie      1
1  Julie    Ben   (Julie, Ben)    Ben  Julie      1
2   Jack   Nina   (Jack, Nina)   Nina   Jack      0
3   Jack  Julie  (Jack, Julie)  Julie   Jack      1
like image 153
piRSquared Avatar answered Oct 28 '22 14:10

piRSquared