Pandas "diff()" with string

Tags:

python

pandas

How can I flag a row in a dataframe every time a column change its string value?

Ex:

Input

ColumnA   ColumnB 1            Blue 2            Blue 3            Red 4            Red 5            Yellow   #  diff won't work here with strings....  only works in numerical values dataframe['changed'] = dataframe['ColumnB'].diff()           ColumnA   ColumnB      changed 1            Blue         0 2            Blue         0 3            Red          1 4            Red          0 5            Yellow       1

1000

asked Oct 31 '16 18:10

guilhermecgs

2 Answers

Use .shift and compare:

dataframe['changed'] = dataframe['ColumnB'] == dataframe['ColumnB'].shift(1).fillna(dataframe['ColumnB'])

answered Sep 18 '22 04:09

Kartik

I get better performance with ne instead of using the actual != comparison:

df['changed'] = df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int)

Timings

Using the following setup to produce a larger dataframe:

df = pd.concat([df]*10**5, ignore_index=True)

I get the following timings:

%timeit df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int) 10 loops, best of 3: 38.1 ms per loop  %timeit (df.ColumnB != df.ColumnB.shift()).astype(int) 10 loops, best of 3: 77.7 ms per loop  %timeit df['ColumnB'] == df['ColumnB'].shift(1).fillna(df['ColumnB']) 10 loops, best of 3: 99.6 ms per loop  %timeit (df.ColumnB.ne(df.ColumnB.shift())).astype(int) 10 loops, best of 3: 19.3 ms per loop

138

answered Sep 18 '22 04:09

root

Related questions
                            
                                Use binary COPY table FROM with psycopg2
                            
                                Create python soap server based on wsdl
                            
                                Using numpy.genfromtxt to read a csv file with strings containing commas
                            
                                argparse "compulsory" optional arguments
                            
                                performing set operations on custom classes in python
                            
                                Can I get a return value from multiprocessing.Process?
                            
                                How to implement a lazy setdefault?
                            
                                PyQt proper use of emit() and pyqtSignal()
                            
                                Dict merge in a dict comprehension
                            
                                How do I use a TimeSeriesSplit with a GridSearchCV object to tune a model in scikit-learn?
                            
                                Compile Cython Extensions Error - Pycharm IDE
                            
                                python image recognition [closed]
                            
                                Is Python's "with" monadic?
                            
                                What is the difference between PyCharm Virtual Environment and Anaconda Environment?
                            
                                HDF5 taking more space than CSV?
                            
                                How to find max value in a numpy array column?
                            
                                Python numpy.square vs **
                            
                                How big can the input to the input() function be?
                            
                                Django Rest Framework - Missing Static Directory
                            
                                What is the difference between numpy.linalg.lstsq and scipy.linalg.lstsq?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With