Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Comparing lists in two columns row-wise efficiently

Tags:

python

pandas

dataframe

numpy

When having a Pandas DataFrame like this:

import pandas as pd
import numpy as np
df = pd.DataFrame({'today': [['a', 'b', 'c'], ['a', 'b'], ['b']], 
                   'yesterday': [['a', 'b'], ['a'], ['a']]})

                 today        yesterday
0      ['a', 'b', 'c']       ['a', 'b']
1           ['a', 'b']            ['a']
2                ['b']            ['a']                          
... etc

But with about 100 000 entries, I am looking to find the additions and removals of those lists in the two columns on a row-wise basis.

It is comparable to this question: Pandas: How to Compare Columns of Lists Row-wise in a DataFrame with Pandas (not for loop)? but I am looking at the differences, and Pandas.apply method seems not to be that fast for such many entries. This is the code that I am currently using. Pandas.apply with numpy's setdiff1d method:

additions = df.apply(lambda row: np.setdiff1d(row.today, row.yesterday), axis=1)
removals  = df.apply(lambda row: np.setdiff1d(row.yesterday, row.today), axis=1)

This works fine, however it takes about a minute for 120 000 entries. So is there a faster way to accomplish this?

like image

477

asked Jan 08 '20 19:01

MegaCookie

People also ask

How do I compare rows in pandas?

You can use the DataFrame. diff() function to find the difference between two rows in a pandas DataFrame. where: periods: The number of previous rows for calculating the difference.

1 Answers

Not sure about performance, but at the lack of a better solution this might apply:

temp = df[['today', 'yesterday']].applymap(set)
removals = temp.diff(periods=1, axis=1).dropna(axis=1)
additions = temp.diff(periods=-1, axis=1).dropna(axis=1)

Removals:

  yesterday
0        {}
1        {}
2       {a}

Additions:

  today
0   {c}
1   {b}
2   {b}

like image

177

answered Sep 28 '22 02:09

r.ook

Sign in to Comment

Related questions
                            
                                django 1.7.8 not sending emails with password reset
                            
                                Remove keys from object not in a list in python? [duplicate]
                            
                                Python - Most elegant way to extract a substring, being given left and right borders [duplicate]
                            
                                Tensorflow: Where is tf.nn.conv2d Actually Executed?
                            
                                django admin, extending admin with custom views
                            
                                What does KFold in python exactly do?
                            
                                python xlsxwriter change row height for all rows in the sheet
                            
                                Group fields in Django's admin forms
                            
                                I cant init Google Cloud SDK on Ubuntu
                            
                                How to install libjpeg on OSX?
                            
                                How do I push new files to GitHub?
                            
                                Heiken Ashi Using pandas python
                            
                                I have string index in pandas DataFrame how can I select by startswith?
                            
                                from . import _methods ImportError: cannot import name '_methods' in cx-freeze python
                            
                                Plotting multiple boxplots in seaborn?
                            
                                Python Get Property if Object is not None
                            
                                How to specify the correlation coefficient as the loss function in keras
                            
                                How to replace 'any strings' with nan in pandas DataFrame using a boolean mask?
                            
                                How do I use oauth2 and refresh tokens with the google api?
                            
                                Difference between model(x) and model.predict(x) in Keras?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With