Conditional removing of duplicates pandas python

Tags:

Is there a way to conditionally drop duplicates (using drop_duplicates specifically) in a pandas dataframe w/about 10 columns and 400,000 rows? That is, I want to keep all rows that have 2 columns meet a condition: if the combination of date (column) and store (column) # are unique, keep row, other wise, drop.

359

asked May 03 '15 04:05

Morgan Sacco

1 Answers

Use drop_duplicates to return dataframe with duplicate rows removed, optionally only considering certain columns

Let initial dataframe be like

In [34]: df
Out[34]:
  Col1 Col2  Col3
0    A    B    10
1    A    B    20
2    A    C    20
3    C    B    20
4    A    B    20

If you want to take unique combinations from certain columns 'Col1', 'Col2'

In [35]: df.drop_duplicates(['Col1', 'Col2'])
Out[35]:
  Col1 Col2  Col3
0    A    B    10
2    A    C    20
3    C    B    20

If you want to take unique combinations of all columns

In [36]: df.drop_duplicates()
Out[36]:
  Col1 Col2  Col3
0    A    B    10
1    A    B    20
2    A    C    20
3    C    B    20

188

answered Sep 20 '22 13:09

Zero

Related questions
                            
                                Working with Password Protected Excel Sheets in Python on Linux
                            
                                Lazy loading csv with pandas
                            
                                Why is %s faster than %d for integer substitution in python?
                            
                                python mock property setter while wrapping it
                            
                                Sort QTableView in pyqt5
                            
                                Python: How to extend a huge class with minimum lines of code?
                            
                                gedit plugin error - plugin loader 'python3' was not found
                            
                                C# Parallel.Foreach equivalent in Python
                            
                                Equivalent of Python's list sort with key / Schwartzian transform
                            
                                Access Child class variables in Parent class
                            
                                Using StanfordParser to get typed dependencies from a parsed sentence
                            
                                Python epsilon is not the smallest number
                            
                                Reading in parts of file, stopping and starting with certain words
                            
                                Numpy/scipy deprecation warning for "rank"
                            
                                IPython fails to load a module where the standard interpreter works
                            
                                Flask Hangs on request
                            
                                Why doesn't Flask use my custom json_encoder?
                            
                                Efficiently checking Euclidean distance for a large number of objects in Python
                            
                                How to manage and communicate with multiple IPython/Jupyter kernels from a Python script?
                            
                                Install f2py with python3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Conditional removing of duplicates pandas python

Tags:

python

pandas

dataframe

numpy

python-2.7

Morgan Sacco

People also ask

1 Answers

Zero

Recent Activity

Donate For Us