Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I combine two dataframes based on a column of lists in Pandas

import pandas as pd

Reproducible setup

I have two dataframes:

df=\
pd.DataFrame.from_dict({'A':['xy','yx','zy','zz'],
                        'B':[[1, 3],[4, 3, 5],[3],[2, 6]]})

df2=\
pd.DataFrame.from_dict({'B':[1,3,4,5,6],
                        'C':['pq','rs','pr','qs','sp']})

df looks like:

    A          B
0  xy     [1, 3]
1  yx  [4, 3, 5]
2  zy        [3]
3  zz     [2, 6]

df2 looks like:

   B   C
0  1  pq
1  3  rs
2  4  pr
3  5  qs
4  6  sp

Aim

I would like to combine these two to form res:

res=\
pd.DataFrame.from_dict({'A':['xy','yx','zy','zz'],
                        'C':['pq','pr','rs','sp']})

ie

    A   C
0  xy  pq
1  yx  pr
2  zy  rs
3  zz  sp

The row with xy in df has the lsit [1,3]. There is a row with value 1 in column B in df2. The C column has value pq in that row, so I combine xy with pq. Same for the next two rows. Last row: there is no value with 2 in column B in df2, so I go for the value 6 (the last row in df has the list [2,6]).


Question

How can I achieve this without iterating through the dataframe?


A very similar post in Spanish SO, which inspired this post.

like image 796
zabop Avatar asked Dec 25 '20 00:12

zabop


People also ask

How do I join two DataFrames in pandas based on multiple columns?

To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.


1 Answers

You can explode "B" into separate rows, then merge on "B" and drop duplicates.

Big thanks to Asish M. in the comments for pointing out a potential bug with the ordering.

(df.explode('B')
   .merge(df2, on='B', how='left')
   .dropna(subset=['C'])
   .drop_duplicates('A'))

    A  B   C
0  xy  1  pq
2  yx  4  pr
5  zy  3  rs
7  zz  6  sp

Ideally, the following should have worked:

df.explode('B').merge(df2).drop_duplicates('A')

However, pandas (as of writing, version 1.2dev) does not preserve the ordering of the left keys on a merge which is a bug, see GH18776.

In the meantime, we can use the workaround of a left merge as shown above.

like image 169
cs95 Avatar answered Sep 28 '22 07:09

cs95