Pandas: Join if value of df1 column is in list of df2 column

Tags:

Suppose we have two Pandas DataFrames as follows:

df1 = pd.DataFrame({'id': ['a', 'b', 'c']})
df1
    id
0   a
1   b
2   c

df2 = pd.DataFrame({'ids': [['b','c'], ['a', 'b'], ['a', 'z']], 
                    'info': ['asdf', 'zxcv', 'sdfg']})
df2
    ids     info
0   [b, c]  asdf
1   [a, b]  zxcv
2   [a, z]  sdfg

How do I join/merge the rows of df1 with df2 where df1.id is in df2.ids?

In other words, how do I achieve the following:

df3
   id   ids     info
0  a    [a, b]  asdf
1  a    [a, z]  sdfg
2  b    [b, c]  asdf
3  b    [a, b]  zxcv
4  c    [b, c]  asdf

And also a version of the above aggregated on id, like so:

df3
   id   ids               info
0  a    [[a, b], [a, z]]  [asdf, sdfg]
2  b    [[a, b], [b, c]]  [asdf, zxcv]
3  c    [[b, c]]          [asdf]

I tried the following:

df1.merge(df2, how = 'left', left_on = 'id', right_on = 'ids')
TypeError: unhashable type: 'list'

df1.id.isin(df2.ids)
TypeError: unhashable type: 'list'

948

asked Dec 27 '18 09:12

user2205916

1 Answers

Using stack, merge and groupby.agg:

df = df2.set_index('info').ids.apply(pd.Series)\
        .stack().reset_index(0, name='id').merge(df2)\
        .merge(df1, how='right').sort_values('id')\
        .reset_index(drop=True)

print(df)
   info id     ids
0  zxcv  a  [a, b]
1  sdfg  a  [a, z]
2  asdf  b  [b, c]
3  zxcv  b  [a, b]
4  asdf  c  [b, c]

For aggregation use:

df = df.groupby('id', as_index=False).agg(list)

print(df)
  id          info               ids
0  a  [zxcv, sdfg]  [[a, b], [a, z]]
1  b  [asdf, zxcv]  [[b, c], [a, b]]
2  c        [asdf]          [[b, c]]

185

answered Sep 29 '22 03:09

Space Impact

Related questions
                            
                                how to quit/close Anaconda Navigator
                            
                                Use multiple directories for flow_from_directory in Keras
                            
                                Python Selenium Wait for user to click a button
                            
                                How to rotate image before save in Django?
                            
                                Python Timedelta64 convert days to months
                            
                                Keras Model with Maxpooling1D and channel_first
                            
                                Tensorflow 1.10 TFRecordDataset - recovering TFRecords
                            
                                gdb.execute blocks all the threads in python scripts
                            
                                Does importing a Python file also import the imported files into shell?
                            
                                How to get all characters of an arbitrary encoding?
                            
                                Python's _winapi module
                            
                                Why do I fail to predict y=x**4 with Keras? (y=x**3 works)
                            
                                BeautifulSoup Prettify custom new line option
                            
                                Map index of numpy matrix
                            
                                Pandas DataFrame: difference between rolling and expanding function
                            
                                Cannot take the length of Shape with unknown rank
                            
                                How to efficiently partial argsort Pandas dataframe across columns
                            
                                Python: monkey patch a function's source code
                            
                                pytest output results are garbled within pycharm
                            
                                pandas stack and unstack performance reduces after dataframe compression and is much worse than R's data.table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: Join if value of df1 column is in list of df2 column

Tags:

python

pandas

dataframe

user2205916

People also ask

1 Answers

Space Impact

Recent Activity

Donate For Us