How to drop extra copy of duplicate index of Pandas Series?

Tags:

python

pandas

I have a Series s with duplicate index :

>>> s
STK_ID  RPT_Date
600809  20061231    demo_str
        20070331    demo_str
        20070630    demo_str
        20070930    demo_str
        20071231    demo_str
        20060331    demo_str
        20060630    demo_str
        20060930    demo_str
        20061231    demo_str
        20070331    demo_str
        20070630    demo_str
Name: STK_Name, Length: 11

And I just want to keep the unique rows and only one copy of the duplicate rows by:

s[s.index.unique()]

Pandas 0.10.1.dev-f7f7e13 give the below error msg

>>> s[s.index.unique()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\Python27\lib\site-packages\pandas\core\series.py", line 515, in __getitem__
    return self._get_with(key)
  File "d:\Python27\lib\site-packages\pandas\core\series.py", line 558, in _get_with
    return self.reindex(key)
  File "d:\Python27\lib\site-packages\pandas\core\series.py", line 2361, in reindex
    level=level, limit=limit)
  File "d:\Python27\lib\site-packages\pandas\core\index.py", line 2063, in reindex
    limit=limit)
  File "d:\Python27\lib\site-packages\pandas\core\index.py", line 2021, in get_indexer
    raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects
>>>

So how to drop extra duplicate rows of series, keep the unique rows and only one copy of the duplicate rows in an efficient way ? (better in one line)

646

asked Jan 18 '13 09:01

bigbug

2 Answers

You can groupby the index and apply a function that returns one value per index group. Here, I take the first value:

In [1]: s = Series(range(10), index=[1,2,2,2,5,6,7,7,7,8])

In [2]: s
Out[2]:
1    0
2    1
2    2
2    3
5    4
6    5
7    6
7    7
7    8
8    9

In [3]: s.groupby(s.index).first()
Out[3]:
1    0
2    1
5    4
6    5
7    6
8    9

UPDATE

Addressing BigBug's comment about crashing when passing a MultiIndex to Series.groupby():

In [1]: s
Out[1]:
STK_ID  RPT_Date
600809  20061231    demo
        20070331    demo
        20070630    demo
        20070331    demo

In [2]: s.reset_index().groupby(s.index.names).first()
Out[2]:
                    0
STK_ID RPT_Date
600809 20061231  demo
       20070331  demo
       20070630  demo

191

answered Oct 25 '22 14:10

Zelazny7

You could subset your data with duplicated (which keeps first value by default) for index. With @Zelazny7 example:

s = pd.Series(range(10), index=[1,2,2,2,5,6,7,7,7,8])

In [130]: s[~s.index.duplicated()]
Out[130]: 
1    0
2    1
5    4
6    5
7    6
8    9
dtype: int64

answered Oct 25 '22 16:10

Anton Protopopov

Related questions
                            
                                Python: third Friday of a month
                            
                                Flask HTTP Basicauth - How does it work?
                            
                                Access Github API using Personal Access Token with Python urllib2
                            
                                Python Embedding in C++ : ImportError: No module named pyfunction
                            
                                How do you install mysql-connector-python (development version) through pip?
                            
                                Colorbar on Geopandas
                            
                                Python request in AWS Lambda timing out
                            
                                Print list of lists in separate lines
                            
                                How do you use python-decouple to load a .env file outside the expected paths?
                            
                                Fastest way to create strictly increasing lists in Python
                            
                                Django widget override template
                            
                                protoc object_detection/protos/*.proto: No such file or directory
                            
                                'tensorboard' is not recognized as an internal or external command,
                            
                                Using OR comparisons with IF statements [duplicate]
                            
                                How do I write a single-file Django application?
                            
                                Script to remove Python comments/docstrings
                            
                                Is it possible to declare a function without arguments but then pass some arguments to that function without raising exception?
                            
                                How to traverse through the files in a directory?
                            
                                Getting output with IPython Notebook
                            
                                How to write multiple try statements in one block in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With