Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas remove duplicates in series

Tags:

python

pandas

Is there a function to enforce that the index is unique or is it only possibly to handle this in python 'itself' by converting to dict and back or something like that?

As noted in the comments below: python pandas is a project built on numpy/scipy.

to_dict and back works, but I bet this gets slow when you get BIG.

In [24]: a = pandas.Series([1,2,3], index=[1,1,2])

In [25]: a
Out[25]: 
1    1
1    2
2    3

In [26]: a = a.to_dict()

In [27]: a
Out[27]: {1: 2, 2: 3}

In [28]: a = pandas.Series(a)

In [29]: a
Out[29]: 
1    2
2    3
like image 895
mathtick Avatar asked Oct 18 '12 19:10

mathtick


2 Answers

BTW we plan on adding a drop_duplicates method to Series like DataFrame.drop_duplicates in the near future.

like image 60
Wes McKinney Avatar answered Sep 20 '22 20:09

Wes McKinney


Use groupby and last()

In [279]: s
Out[279]: 
a    1
b    2
b    3
b    4
e    5

In [280]: grouped = s.groupby(level=0)

In [281]: grouped.first()
Out[281]: 
a    1
b    2
e    5

In [282]: grouped.last()
Out[282]: 
a    1
b    4
e    5
like image 32
root Avatar answered Sep 22 '22 20:09

root