Pandas, how to filter a df to get unique entries?

Tags:

I have a dataframe like this:

ID  type value
1   A    8
2   A    5
3   B    11
4   C    12
5   D    1
6   D    22
7   D    13

I want to filter the dataframe so that I have a unique occurrence of "type" attrybute (e.g. A appears only once), and if there are more rows that have the same value for "type" I want to choose the one with higher value. I want to get something like:

ID  type value
1   A    8
3   B    11
4   C    12
6   D    22

How do I do this with pandas?

881

asked Jan 28 '14 10:01

Gioelelm

2 Answers

one way is to sort the dataframe and then take the first after a groupby.

# first way
sorted = df.sort_values(['type', 'value'], ascending = [True, False])

first = sorted.groupby('type').first().reset_index()

another way does not necessarily take only the first one, so potentially it would keep all IDs corresponding to the same maximum (and not take just 1 of them)

# second way
grouped = df.groupby('type').agg({'value': max}).reset_index()
grouped = grouped.set_index(['type','value'])

second = grouped.join(df.set_index(['type', 'value']))

example:

data

ID  type    value
1   A   8
2   A   5
3   B   11
4   C   12
5   D   1
6   D   22
7   D   13
8   D   22

first method results in

type  ID  value
A   1      8
B   3     11
C   4     12
D   6     22

second method keeps ID=8

            ID
type value    
A    8       1
B    11      3
C    12      4
D    22      6
     22      8

(you can reset_index() again here if you don't like the multiindex)

185

answered Oct 13 '22 04:10

mkln

df[['type', 'value']].drop_duplicates(subset=['type'])

This works generally, if you would have more columns, you can select the interested columns, in our case we chose all, 'type', 'value'.

answered Oct 13 '22 04:10

vesszabo

Related questions
                            
                                Print list in table format in python
                            
                                Align two lists by adding special values for missing entries
                            
                                Python class variable name vs __name__
                            
                                Fastest way to parse XML in Python
                            
                                Interaction between networkx and matplotlib
                            
                                Can I use TLS version 1.1 or 1.2 in python 2?
                            
                                When is the object() built-in useful?
                            
                                zip function giving incorrect output
                            
                                Pandas Dataframe add header without replacing current header
                            
                                PySide: set width of QVBoxLayout
                            
                                Unit testing a Django query set
                            
                                Flask-Principal Best Practice of Handling PermissionDenied Exception
                            
                                How to concatenate two html file bodies with BeautifulSoup?
                            
                                linalg.norm not taking axis argument
                            
                                Python count all possible combinations for a table
                            
                                How do you override BaseHTTPRequestHandler log_message() method to log to a file rather than to console (sys.stderr)?
                            
                                connecting to mysql db on amazon rds
                            
                                Cython: Should I use np.float_t rather than double for typed memory views
                            
                                Is it possible to access current object while doing list/dict comprehension in Python?
                            
                                Why does mixing types in Python struct.pack uses more space than needed?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas, how to filter a df to get unique entries?

Tags:

python

pandas

dataframe

numpy

Gioelelm

People also ask

2 Answers

mkln

vesszabo

Recent Activity

Donate For Us