Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditionally drop Pandas Dataframe row

I wish to drop rows where the rows just before and just after has the same value for the column num2. My dataframe looks like this:

import pandas as pd

df = pd.DataFrame([
    [12, 10],
    [11, 10],
    [13, 10],
    [42, 11],
    [4, 11],
    [5, 2]
], columns=["num1", "num2"]
)

And this is my target:

df = pd.DataFrame([
    [12, 10],
    [13, 10],
    [42, 11],
    [4, 11],
    [5, 2]
], columns=["num1", "num2"]
)

What I have tried:

df["num1_diff"] = df["num2"].diff().fillna(0).astype(int)
filt = df["num1_diff"].apply(lambda x: x == 0)
print(df[filt])

Giving:

   num1  num2  num1_diff
0    12    10          0
1    11    10          0
2    13    10          0
4     4    11          0

And I was thinking to use the new num1_diff column to do the filtering. Is this a good approach, or is there perhaps a better one?

like image 616
Gustav Rasmussen Avatar asked Jul 16 '20 11:07

Gustav Rasmussen


1 Answers

Use Series.shift twice, and check where num2 equals:

df[df['num2'].shift().ne(df['num2'].shift(-1))]

   num1  num2
0    12    10
2    13    10
3    42    11
4     4    11
5     5     2
like image 105
Erfan Avatar answered Sep 30 '22 10:09

Erfan