Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining when a column value changes in pandas dataframe

I am looking to write a quick script that will run through a csv file with two columns and provide me the rows in which the values in column B switch from one value to another:

eg:

dataframe:

# |  A  |  B   --+-----+----- 1 |  2  |  3 2 |  3  |  3 3 |  4  |  4 4 |  5  |  4 5 |  5  |  4 

would tell me that the change happened between row 2 and row 3. I know how to get these values using for loops but I was hoping there was a more pythonic way of approaching this problem.

like image 517
badrobit Avatar asked May 12 '15 16:05

badrobit


People also ask

What does diff () do in Pandas?

The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.

How do you determine whether a Pandas column contains a particular value?

You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.

What does .values do in Pandas?

The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.


2 Answers

You can create a new column for the difference

> df['C'] = df['B'].diff() > print df    #  A  B   C 0  1  2  3 NaN 1  2  3  3   0 2  3  4  4   1 3  4  5  4   0 4  5  5  4   0  > df_filtered = df[df['C'] != 0] > print df_filtered    #  A  B  C 2  3  4  4  1 

This will your required rows

like image 185
Kathirmani Sukumar Avatar answered Sep 20 '22 17:09

Kathirmani Sukumar


You can do the following which also works for non numerical values:

>>> import pandas as pd >>> df = pd.DataFrame({"Status": ["A","A","B","B","C","C","C"]}) >>> df["isStatusChanged"] = df["Status"].shift(1, fill_value=df["Status"].head(1)) != df["Status"] >>> df   Status  isStatusChanged 0      A            False 1      A            False 2      B             True 3      B            False 4      C             True 5      C            False 6      C            False >>>   

Note the fill_value could be different depending on your application.

like image 42
Hagalín Ásgrímur Guðmundsson Avatar answered Sep 20 '22 17:09

Hagalín Ásgrímur Guðmundsson