I am looking to write a quick script that will run through a csv file with two columns and provide me the rows in which the values in column B switch from one value to another: eg: dataframe: <pre class="prettyprint"><code># | A | B --+-----+----- 1 | 2 | 3 2 | 3 | 3 3 | 4 | 4 4 | 5 | 4 5 | 5 | 4 </code></pre> would tell me that the change happened between row 2 and row 3. I know how to get these values using for loops but I was hoping there was a more pythonic way of approaching this problem.

You can create a new column for the difference <pre class="prettyprint"><code>> df['C'] = df['B'].diff() > print df # A B C 0 1 2 3 NaN 1 2 3 3 0 2 3 4 4 1 3 4 5 4 0 4 5 5 4 0 > df_filtered = df[df['C'] != 0] > print df_filtered # A B C 2 3 4 4 1 </code></pre> This will your required rows

Determining when a column value changes in pandas dataframe

Tags:

python

search

pandas

dataframe

csv

I am looking to write a quick script that will run through a csv file with two columns and provide me the rows in which the values in column B switch from one value to another:

eg:

dataframe:

# |  A  |  B   --+-----+----- 1 |  2  |  3 2 |  3  |  3 3 |  4  |  4 4 |  5  |  4 5 |  5  |  4

would tell me that the change happened between row 2 and row 3. I know how to get these values using for loops but I was hoping there was a more pythonic way of approaching this problem.

517

asked May 12 '15 16:05

badrobit

2 Answers

You can create a new column for the difference

> df['C'] = df['B'].diff() > print df    #  A  B   C 0  1  2  3 NaN 1  2  3  3   0 2  3  4  4   1 3  4  5  4   0 4  5  5  4   0  > df_filtered = df[df['C'] != 0] > print df_filtered    #  A  B  C 2  3  4  4  1

This will your required rows

185

answered Sep 20 '22 17:09

Kathirmani Sukumar

You can do the following which also works for non numerical values:

>>> import pandas as pd >>> df = pd.DataFrame({"Status": ["A","A","B","B","C","C","C"]}) >>> df["isStatusChanged"] = df["Status"].shift(1, fill_value=df["Status"].head(1)) != df["Status"] >>> df   Status  isStatusChanged 0      A            False 1      A            False 2      B             True 3      B            False 4      C             True 5      C            False 6      C            False >>>

Note the fill_value could be different depending on your application.

answered Sep 20 '22 17:09

Hagalín Ásgrímur Guðmundsson

Related questions
                            
                                A QuerySet by aggregate field value
                            
                                Converting a Python Float to a String without losing precision
                            
                                How to use Python to programmatically generate part of Sphinx documentation?
                            
                                Plotting directed graphs in Python in a way that show all edges separately
                            
                                Use Python Selenium to get span text
                            
                                Python SVG parser
                            
                                How to make a shallow copy of a list in Python
                            
                                Celery Logs into file
                            
                                Python: Argument Parsing Validation Best Practices
                            
                                Django: timezone.now vs timezone.now()
                            
                                Python - object MagicMock can't be used in 'await' expression
                            
                                Could not install packages due to an EnvironmentError: [Errno 28] No space left on device
                            
                                seriously simple python HTTP proxy? [duplicate]
                            
                                Python Image Library: How to combine 4 images into a 2 x 2 grid?
                            
                                Get cookie from CookieJar by name
                            
                                Python http.client json request and response. How?
                            
                                python plot simple histogram given binned data
                            
                                Python: running subprocess in parallel [duplicate]
                            
                                SSL error : routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
                            
                                Is it safe to rely on Python function arguments evaluation order? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With