import pandas as pd
I have two dataframes:
df=\
pd.DataFrame.from_dict({'A':['xy','yx','zy','zz'],
'B':[[1, 3],[4, 3, 5],[3],[2, 6]]})
df2=\
pd.DataFrame.from_dict({'B':[1,3,4,5,6],
'C':['pq','rs','pr','qs','sp']})
df
looks like:
A B
0 xy [1, 3]
1 yx [4, 3, 5]
2 zy [3]
3 zz [2, 6]
df2
looks like:
B C
0 1 pq
1 3 rs
2 4 pr
3 5 qs
4 6 sp
I would like to combine these two to form res
:
res=\
pd.DataFrame.from_dict({'A':['xy','yx','zy','zz'],
'C':['pq','pr','rs','sp']})
ie
A C
0 xy pq
1 yx pr
2 zy rs
3 zz sp
The row with xy
in df
has the lsit [1,3]
. There is a row with value 1
in column B
in df2
. The C
column has value pq
in that row, so I combine xy
with pq
. Same for the next two rows. Last row: there is no value with 2 in column B
in df2
, so I go for the value 6
(the last row in df
has the list [2,6]
).
How can I achieve this without iterating through the dataframe?
A very similar post in Spanish SO, which inspired this post.
To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.
You can explode
"B" into separate rows, then merge on "B" and drop duplicates.
Big thanks to Asish M. in the comments for pointing out a potential bug with the ordering.
(df.explode('B')
.merge(df2, on='B', how='left')
.dropna(subset=['C'])
.drop_duplicates('A'))
A B C
0 xy 1 pq
2 yx 4 pr
5 zy 3 rs
7 zz 6 sp
Ideally, the following should have worked:
df.explode('B').merge(df2).drop_duplicates('A')
However, pandas (as of writing, version 1.2dev) does not preserve the ordering of the left keys on a merge which is a bug, see GH18776.
In the meantime, we can use the workaround of a left merge as shown above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With