I have this:
Date value
0 1975 a
21 1975 b
1 1976 b
22 1976 c
3 1977 a
2 1977 b
4 1978 c
25 1978 d
5 1979 e
26 1979 f
6 1980 a
27 1980 f
I am having trouble finding a way to keep only the lines containing the first occurrence of a 'value'. I want to drop duplicate 'values', keeping the row with the lowest 'Date'.The end result should be:
Date value
0 1975 a
21 1975 b
22 1976 c
25 1978 d
5 1979 e
26 1979 f
The first occurrence is kept and the rest of the duplicates are deleted.
And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame: df = df. drop_duplicates(subset=['col1', 'col2', ...])
The iloc() function in python is one of the functions defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc() function in python, we can easily retrieve any particular value from a row or column using index values.
To make a bit more explicit what Quazi posted: drop_duplicates()
is what you need. By default, it keeps the first occurence and drops everything thereafter - look at the manual for more information. So, to be sure, you should do
>>> dataframe = oldDf.sort('Date').drop_duplicates(subset=['value'])
>>> dataframe
Out[490]:
Date value
0 1975 a
21 1975 b
22 1976 c
25 1978 d
5 1979 e
26 1979 f
df.drop_duplicates(subset=['value'], inplace=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With