Python Pandas: Keeping only dataframe rows containing first occurrence of an item

Tags:

pandas

I have this:

    Date value
0   1975     a
21  1975     b
1   1976     b
22  1976     c
3   1977     a
2   1977     b
4   1978     c
25  1978     d
5   1979     e
26  1979     f
6   1980     a
27  1980     f

I am having trouble finding a way to keep only the lines containing the first occurrence of a 'value'. I want to drop duplicate 'values', keeping the row with the lowest 'Date'.The end result should be:

    Date value
0   1975     a
21  1975     b
22  1976     c
25  1978     d
5   1979     e
26  1979     f

512

asked Jun 10 '14 08:06

DIGSUM

2 Answers

To make a bit more explicit what Quazi posted: drop_duplicates() is what you need. By default, it keeps the first occurence and drops everything thereafter - look at the manual for more information. So, to be sure, you should do

>>> dataframe = oldDf.sort('Date').drop_duplicates(subset=['value'])
>>> dataframe
Out[490]: 
    Date value
0   1975     a
21  1975     b
22  1976     c
25  1978     d
5   1979     e
26  1979     f

112

answered Sep 29 '22 01:09

FooBar

df.drop_duplicates(subset=['value'], inplace=True)

answered Sep 29 '22 01:09

Quazi Farhan

Related questions
                            
                                How do you find the first element of a path?
                            
                                Is __init__ always required?
                            
                                Django rest framework auto-populate filed with user.id
                            
                                How to solve recurrence relations in Python
                            
                                how to write a unicode csv in Python 2.7
                            
                                python built in server not loading css
                            
                                Generate Smooth White Border Around Circular Image
                            
                                How does python represent such large integers?
                            
                                numpy random.choice elements that are not selected
                            
                                legend in python networkx
                            
                                Python openCV: kmeans example not working
                            
                                Return two SqlAlchemy Columns concatenated
                            
                                What is the best way to take np.percentile along an axis ignoring nans?
                            
                                Can i write the output format created by prettytable into a file? [closed]
                            
                                Read response AT command with pySerial
                            
                                Automate compilation of protobuf specs into python classes in setup.py
                            
                                Calculate rolling time difference in pandas efficiently
                            
                                What is the pythonic way to bubble up error conditions
                            
                                Change default options in pandas
                            
                                Adding a legend outside of multiple subplots with matplotlib

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With