import pandas as pd
I have two dataframes:
df=\
pd.DataFrame.from_dict({'A':['xy','yx','zy','zz'],
'B':[[1, 3],[4, 3, 5],[3],[2, 6]]})
df2=\
pd.DataFrame.from_dict({'B':[1,3,4,5,6],
'C':['pq','rs','pr','qs','sp']})
df looks like:
A B
0 xy [1, 3]
1 yx [4, 3, 5]
2 zy [3]
3 zz [2, 6]
df2 looks like:
B C
0 1 pq
1 3 rs
2 4 pr
3 5 qs
4 6 sp
I would like to combine these two to form res:
res=\
pd.DataFrame.from_dict({'A':['xy','yx','zy','zz'],
'C':['pq','pr','rs','sp']})
ie
A C
0 xy pq
1 yx pr
2 zy rs
3 zz sp
The row with xy in df has the lsit [1,3]. There is a row with value 1 in column B in df2. The C column has value pq in that row, so I combine xy with pq. Same for the next two rows. Last row: there is no value with 2 in column B in df2, so I go for the value 6 (the last row in df has the list [2,6]).
How can I achieve this without iterating through the dataframe?
A very similar post in Spanish SO, which inspired this post.
To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.
You can explode "B" into separate rows, then merge on "B" and drop duplicates.
Big thanks to Asish M. in the comments for pointing out a potential bug with the ordering.
(df.explode('B')
.merge(df2, on='B', how='left')
.dropna(subset=['C'])
.drop_duplicates('A'))
A B C
0 xy 1 pq
2 yx 4 pr
5 zy 3 rs
7 zz 6 sp
Ideally, the following should have worked:
df.explode('B').merge(df2).drop_duplicates('A')
However, pandas (as of writing, version 1.2dev) does not preserve the ordering of the left keys on a merge which is a bug, see GH18776.
In the meantime, we can use the workaround of a left merge as shown above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With