I am looking to write a quick script that will run through a csv file with two columns and provide me the rows in which the values in column B switch from one value to another:
eg:
dataframe:
# | A | B --+-----+----- 1 | 2 | 3 2 | 3 | 3 3 | 4 | 4 4 | 5 | 4 5 | 5 | 4
would tell me that the change happened between row 2 and row 3. I know how to get these values using for loops but I was hoping there was a more pythonic way of approaching this problem.
The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.
The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.
You can create a new column for the difference
> df['C'] = df['B'].diff() > print df # A B C 0 1 2 3 NaN 1 2 3 3 0 2 3 4 4 1 3 4 5 4 0 4 5 5 4 0 > df_filtered = df[df['C'] != 0] > print df_filtered # A B C 2 3 4 4 1
This will your required rows
You can do the following which also works for non numerical values:
>>> import pandas as pd >>> df = pd.DataFrame({"Status": ["A","A","B","B","C","C","C"]}) >>> df["isStatusChanged"] = df["Status"].shift(1, fill_value=df["Status"].head(1)) != df["Status"] >>> df Status isStatusChanged 0 A False 1 A False 2 B True 3 B False 4 C True 5 C False 6 C False >>>
Note the fill_value
could be different depending on your application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With