Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas Dropping First in Series of Duplicates

Tags:

python

pandas

What is the most Pythonic way to drop the only the first in a series of duplicates?

I have a dataframe:

Group    Value
  a        0
  a        1
  a        2
  b        6
  b        7
  b        8

and I want the following result:

Group    Value
  a        1
  a        2
  b        7
  b        8

drop_duplicates keeps the first or last item depending on how you set it. I want to drop the first occurrance where there is a duplicate and keep the rest.

like image 554
Windstorm1981 Avatar asked Dec 19 '22 02:12

Windstorm1981


2 Answers

Use duplicated() to create a boolean mask and filter based on it:

df[df.Group.duplicated()]

#Group  Value
#1   a      1
#2   a      2
#4   b      7
#5   b      8

duplicated by default masks duplicates except the first occurrence as True:

df.Group.duplicated()

#0    False
#1     True
#2     True
#3    False
#4     True
#5     True
#Name: Group, dtype: bool

To keep the one row per group edge case (won't be so efficient any more):

df[df.Group.duplicated() | df.Group.groupby(df.Group).transform('count').eq(1)]

# Group Value
#1    a     1
#2    a     2
#4    b     7
#5    b     8

Or:

df[df.Group.groupby(df.Group).transform(lambda x: (x.size == 1) | x.duplicated())]
# Group  Value
#1    a      1
#2    a      2
#4    b      7
#5    b      8
like image 112
Psidom Avatar answered Dec 27 '22 01:12

Psidom


If it is unique row , you want to keep it

df.groupby('Group').Value.apply(lambda x : x.iloc[1:] if len(x)>1 else x).reset_index('Group')
Out[144]: 
  Group  Value
1     a      1
2     a      2
4     b      7
5     b      8
6     c     11

Data input

df
Out[138]: 
  Group  Value
0     a      0
1     a      1
2     a      2
3     b      6
4     b      7
5     b      8
6     c     11
like image 44
BENY Avatar answered Dec 27 '22 00:12

BENY