Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: Keeping only dataframe rows containing first occurrence of an item

Tags:

python

pandas

I have this:

    Date value
0   1975     a
21  1975     b
1   1976     b
22  1976     c
3   1977     a
2   1977     b
4   1978     c
25  1978     d
5   1979     e
26  1979     f
6   1980     a
27  1980     f

I am having trouble finding a way to keep only the lines containing the first occurrence of a 'value'. I want to drop duplicate 'values', keeping the row with the lowest 'Date'.The end result should be:

    Date value
0   1975     a
21  1975     b
22  1976     c
25  1978     d
5   1979     e
26  1979     f
like image 512
DIGSUM Avatar asked Jun 10 '14 08:06

DIGSUM


People also ask

Does remove duplicates keep the first instance Pandas?

The first occurrence is kept and the rest of the duplicates are deleted.

How do you keep unique rows in Pandas?

And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame: df = df. drop_duplicates(subset=['col1', 'col2', ...])

What is ILOC () in Python?

The iloc() function in python is one of the functions defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc() function in python, we can easily retrieve any particular value from a row or column using index values.


2 Answers

To make a bit more explicit what Quazi posted: drop_duplicates() is what you need. By default, it keeps the first occurence and drops everything thereafter - look at the manual for more information. So, to be sure, you should do

>>> dataframe = oldDf.sort('Date').drop_duplicates(subset=['value'])
>>> dataframe
Out[490]: 
    Date value
0   1975     a
21  1975     b
22  1976     c
25  1978     d
5   1979     e
26  1979     f
like image 112
FooBar Avatar answered Sep 29 '22 01:09

FooBar


df.drop_duplicates(subset=['value'], inplace=True)
like image 43
Quazi Farhan Avatar answered Sep 29 '22 01:09

Quazi Farhan