Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

df.unique() on whole DataFrame based on a column

Tags:

python

python-3.x

pandas

dataframe

duplicates

I have a DataFrame df filled with rows and columns where there are duplicate Id's:

Index   Id   Type 0       a1   A 1       a2   A 2       b1   B 3       b3   B 4       a1   A ...

When I use:

uniqueId = df["Id"].unique()

I get a list of unique IDs.

How can I however apply this filtering on the whole DataFrame such that it keeps the structure but that the duplicates (based on "Id") are removed?

like image

812

asked Apr 03 '17 12:04

JohnAndrews

People also ask

How do I print unique values from a column in pandas?

You can use the pandas unique() function to get the different unique values present in a column. It returns a numpy array of the unique values in the column.

1 Answers

It seems you need DataFrame.drop_duplicates with parameter subset which specify where are test duplicates:

#keep first duplicate value df = df.drop_duplicates(subset=['Id']) print (df)        Id Type Index          0      a1    A 1      a2    A 2      b1    B 3      b3    B

#keep last duplicate value df = df.drop_duplicates(subset=['Id'], keep='last') print (df)        Id Type Index          1      a2    A 2      b1    B 3      b3    B 4      a1    A

#remove all duplicate values df = df.drop_duplicates(subset=['Id'], keep=False) print (df)        Id Type Index          1      a2    A 2      b1    B 3      b3    B

like image

176

answered Sep 28 '22 21:09

jezrael

Sign in to Comment

Related questions
                            
                                chr() equivalent returning a bytes object, in py3k
                            
                                Writing test cases for django models
                            
                                virtualenv Env not creating bin directory in Windows 7
                            
                                After groupby, how to flatten column headers?
                            
                                Python 3 dataclass initialization
                            
                                Convert from '_io.BytesIO' to a bytes-like object in python3.6?
                            
                                How to use `async for` in Python?
                            
                                What does [:-1] mean/do in python? [duplicate]
                            
                                How to create an integer array in Python?
                            
                                How to find out what methods, properties, etc a python module possesses
                            
                                Filtering os.walk() dirs and files
                            
                                Python 3 Building an array of bytes
                            
                                Can I make STATICFILES_DIR same as STATIC_ROOT in Django 1.3?
                            
                                Overwrite previous output in jupyter notebook
                            
                                Is it possible to store an array in Django model?
                            
                                django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet. (django 2.0.1)(Python 3.6)
                            
                                Python 2.6 JSON decoding performance
                            
                                nginx.service: Failed to read PID from file /run/nginx.pid: Invalid argument
                            
                                How to import .py file from another directory? [duplicate]
                            
                                pandas Series to Dataframe using Series indexes as columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With