I have this:
    Date value
0   1975     a
21  1975     b
1   1976     b
22  1976     c
3   1977     a
2   1977     b
4   1978     c
25  1978     d
5   1979     e
26  1979     f
6   1980     a
27  1980     f
I am having trouble finding a way to keep only the lines containing the first occurrence of a 'value'. I want to drop duplicate 'values', keeping the row with the lowest 'Date'.The end result should be:
    Date value
0   1975     a
21  1975     b
22  1976     c
25  1978     d
5   1979     e
26  1979     f
                The first occurrence is kept and the rest of the duplicates are deleted.
And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame: df = df. drop_duplicates(subset=['col1', 'col2', ...])
The iloc() function in python is one of the functions defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc() function in python, we can easily retrieve any particular value from a row or column using index values.
To make a bit more explicit what Quazi posted: drop_duplicates() is what you need. By default, it keeps the first occurence and drops everything thereafter - look at the manual for more information. So, to be sure, you should do
>>> dataframe = oldDf.sort('Date').drop_duplicates(subset=['value'])
>>> dataframe
Out[490]: 
    Date value
0   1975     a
21  1975     b
22  1976     c
25  1978     d
5   1979     e
26  1979     f
                        df.drop_duplicates(subset=['value'], inplace=True)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With