I have a dataframe like this:
ID type value
1 A 8
2 A 5
3 B 11
4 C 12
5 D 1
6 D 22
7 D 13
I want to filter the dataframe so that I have a unique occurrence of "type" attrybute (e.g. A appears only once), and if there are more rows that have the same value for "type" I want to choose the one with higher value. I want to get something like:
ID type value
1 A 8
3 B 11
4 C 12
6 D 22
How do I do this with pandas?
And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame: df = df. drop_duplicates(subset=['col1', 'col2', ...])
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.
one way is to sort the dataframe and then take the first after a groupby.
# first way
sorted = df.sort_values(['type', 'value'], ascending = [True, False])
first = sorted.groupby('type').first().reset_index()
another way does not necessarily take only the first one, so potentially it would keep all IDs corresponding to the same maximum (and not take just 1 of them)
# second way
grouped = df.groupby('type').agg({'value': max}).reset_index()
grouped = grouped.set_index(['type','value'])
second = grouped.join(df.set_index(['type', 'value']))
example:
data
ID type value
1 A 8
2 A 5
3 B 11
4 C 12
5 D 1
6 D 22
7 D 13
8 D 22
first method results in
type ID value
A 1 8
B 3 11
C 4 12
D 6 22
second method keeps ID=8
ID
type value
A 8 1
B 11 3
C 12 4
D 22 6
22 8
(you can reset_index()
again here if you don't like the multiindex)
df[['type', 'value']].drop_duplicates(subset=['type'])
This works generally, if you would have more columns, you can select the interested columns, in our case we chose all, 'type', 'value'.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With