Comparing previous row values in Pandas DataFrame

Tags:

import pandas as pd data={'col1':[1,3,3,1,2,3,2,2]} df=pd.DataFrame(data,columns=['col1']) print df            col1       0     1               1     3               2     3               3     1               4     2               5     3               6     2               7     2

I have the following Pandas DataFrame and I want to create another column that compares the previous row of col1 to see if they are equal. What would be the best way to do this? It would be like the following DataFrame. Thanks

    col1  match   0     1   False      1     3   False      2     3   True      3     1   False      4     2   False      5     3   False      6     2   False      7     2   True

959

asked Dec 30 '16 16:12

jth359

2 Answers

You need eq with shift:

df['match'] = df.col1.eq(df.col1.shift()) print (df)    col1  match 0     1  False 1     3  False 2     3   True 3     1  False 4     2  False 5     3  False 6     2  False 7     2   True

Or instead eq use ==, but it is a bit slowier in large DataFrame:

df['match'] = df.col1 == df.col1.shift() print (df)    col1  match 0     1  False 1     3  False 2     3   True 3     1  False 4     2  False 5     3  False 6     2  False 7     2   True

Timings:

import pandas as pd data={'col1':[1,3,3,1,2,3,2,2]} df=pd.DataFrame(data,columns=['col1']) print (df) #[80000 rows x 1 columns] df = pd.concat([df]*10000).reset_index(drop=True)  df['match'] = df.col1 == df.col1.shift() df['match1'] = df.col1.eq(df.col1.shift()) print (df)  In [208]: %timeit df.col1.eq(df.col1.shift()) The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached. 1000 loops, best of 3: 933 µs per loop  In [209]: %timeit df.col1 == df.col1.shift() 1000 loops, best of 3: 1 ms per loop

171

answered Oct 08 '22 22:10

jezrael

1) pandas approach: Use diff:

df['match'] = df['col1'].diff().eq(0)

2) numpy approach: Use np.ediff1d.

df['match'] = np.ediff1d(df['col1'].values, to_begin=np.NaN) == 0

Both produce:

enter image description here

Timings: (for the same DF used by @jezrael)

%timeit df.col1.eq(df.col1.shift()) 1000 loops, best of 3: 731 µs per loop  %timeit df['col1'].diff().eq(0) 1000 loops, best of 3: 405 µs per loop

answered Oct 08 '22 23:10

Nickil Maveli

Related questions
                            
                                Change default Python version from 2.4 to 2.6
                            
                                conda command will prompt error: "Bad Interpreter: No such file or directory"
                            
                                Python round to next highest power of 10
                            
                                Python Dependency Injection Framework
                            
                                Python lambda with if but without else
                            
                                In Pandas how do I convert a string of date strings to datetime objects and put them in a DataFrame?
                            
                                How do I install Jupyter notebook on an Android device?
                            
                                Why is this regular expression so slow in Java? [duplicate]
                            
                                How do I extend the Django Group model?
                            
                                How do I ignore PyCharm configuration files in a git repository?
                            
                                how to do a left,right and mid of a string in a pandas dataframe
                            
                                Actions triggered by field change in Django
                            
                                How to run a python file using cron jobs
                            
                                In OpenCV (Python), why am I getting 3 channel images from a grayscale image?
                            
                                Extract src attribute from img tag using BeautifulSoup
                            
                                How to add delta to python datetime.time?
                            
                                Switch Python Version for Vim & Syntastic
                            
                                How to write 2**n - 1 as a recursive function?
                            
                                How to efficiently get the mean of the elements in two list of lists in Python
                            
                                Calculate weighted average using a pandas/dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Comparing previous row values in Pandas DataFrame

Tags:

python

pandas

boolean

numpy

shift

jth359

People also ask

2 Answers

jezrael

Nickil Maveli

Recent Activity

Donate For Us