I've got a situation with 2 dataframes:
test1 = pd.DataFrame({'id_A':['Ben', 'Julie', 'Jack', 'Jack'],
'id_B':['Julie', 'Ben', 'Nina', 'Julie']})
test2 = pd.DataFrame({'id_a':['Ben', 'Ben', 'Ben', 'Julie', 'Julie', 'Nina'],
'id_b':['Julie', 'Nina', 'Jack', 'Nina', 'Jack', 'Jack'],
'value':[1,1,0,0,1,0]})
>>> test1
id_A id_B
0 Ben Julie
1 Julie Ben
2 Jack Nina
3 Jack Julie
>>> test2
id_a id_b value
0 Ben Julie 1
1 Ben Nina 1
2 Ben Jack 0
3 Julie Nina 0
4 Julie Jack 1
5 Nina Jack 0
What I'd like to do is merge test2
with test1
where id_A == id_a
and id_B == id_b
OR where id_A == id_b
and id_B == id_a
, resulting in the following dataframe:
>>> final_df
id_A id_B value
0 Ben Julie 1
1 Julie Ben 1
2 Jack Nina 0
3 Jack Julie 1
My solution works but seems messy, and I'd like to see if I'm overlooking some more clever way to do things. It involves concatenating test2
with itself, but reversing the 2 columns of interest (id_a
becomes id_b
and vice-versa), and then merging from there.
test3 = pd.concat([test2, test2.rename(columns = {'id_a':'id_b', 'id_b':'id_a'})])
final_df = (test1.merge(test3, left_on = ['id_A', 'id_B'],
right_on = ['id_a', 'id_b'])
.drop(['id_a', 'id_b'], axis=1))
Does anyone know a neater way to do this? I feel like I'm probably overlooking some amazing pandorable way of doing things.
Thanks for your help!
Answer. Yes. Order of the merged dataframes will effect the order of the rows and columns of the merged dataframe. When using the merge() method, it will preserve the order of the left keys.
To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.
Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame. to_csv can be used to write out DataFrames in CSV format.
With frozenset
test1.assign(
value=test1.apply(frozenset, 1).map({frozenset(a): b for *a, b in test2.values}))
id_A id_B value
0 Ben Julie 1
1 Julie Ben 1
2 Jack Nina 0
3 Jack Julie 1
Less cute, more robust. Remove what you need to afterwards.
t1 = test1.assign(ref=list(map(frozenset, zip(test1.id_A, test1.id_B))))
t2 = test2.assign(ref=list(map(frozenset, zip(test2.id_a, test2.id_b))))
t1.merge(t2, on='ref')
id_A id_B ref id_a id_b value
0 Ben Julie (Julie, Ben) Ben Julie 1
1 Julie Ben (Julie, Ben) Ben Julie 1
2 Jack Nina (Jack, Nina) Nina Jack 0
3 Jack Julie (Jack, Julie) Julie Jack 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With