Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame - delete rows that have same value at a particular column as a previous row

I have a pandas dataframe, I want to check for each row if it has the same value at a particular column(let's call it porduct_type), and if it does, delete it. In other words, out of a group of consecutive rows with the same value at a particular column, I want to keep only one.

Example, if column A is the one on which we don't want consecutive duplicates:

input =  
A    B

    0  1    1
    0  2    2
    2  1   10
    2  2   20
    0  11  100
    5  2  200

output =  
A    B

    0  1    1
    2  1   10
    0  11  100
    5  2  200
like image 891
Baron Yugovich Avatar asked Jul 24 '14 21:07

Baron Yugovich


1 Answers

It's a little tricky, but you could do something like

>>> df.groupby((df["A"] != df["A"].shift()).cumsum().values).first()
   A   B    C
1  0   1    1
2  2   1   10
3  0  11  100
4  5   2  200
like image 134
DSM Avatar answered Nov 02 '22 13:11

DSM