How to count duplicate rows in pandas dataframe?

Tags:

python

pandas

I am trying to count the duplicates of each type of row in my dataframe. For example, say that I have a dataframe in pandas as follows:

df = pd.DataFrame({'one': pd.Series([1., 1, 1]),                    'two': pd.Series([1., 2., 1])})

I get a df that looks like this:

    one two 0   1   1 1   1   2 2   1   1

I imagine the first step is to find all the different unique rows, which I do by:

df.drop_duplicates()

This gives me the following df:

    one two 0   1   1 1   1   2

Now I want to take each row from the above df ([1 1] and [1 2]) and get a count of how many times each is in the initial df. My result would look something like this:

Row     Count [1 1]     2 [1 2]     1

How should I go about doing this last step?

Edit:

Here's a larger example to make it more clear:

df = pd.DataFrame({'one': pd.Series([True, True, True, False]),                    'two': pd.Series([True, False, False, True]),                    'three': pd.Series([True, False, False, False])})

gives me:

    one three   two 0   True    True    True 1   True    False   False 2   True    False   False 3   False   False   True

I want a result that tells me:

       Row           Count [True True True]       1 [True False False]     2 [False False True]     1

460

asked Feb 23 '16 17:02

jss367

2 Answers

You can groupby on all the columns and call size the index indicates the duplicate values:

In [28]: df.groupby(df.columns.tolist(),as_index=False).size()  Out[28]: one    three  two   False  False  True     1 True   False  False    2        True   True     1 dtype: int64

answered Oct 02 '22 12:10

EdChum

df.groupby(df.columns.tolist()).size().reset_index().\     rename(columns={0:'records'})     one  two  records 0    1    1        2 1    1    2        1

answered Oct 02 '22 11:10

Denis

Related questions
                            
                                Syntax highlighting in vim for python
                            
                                How to run Python script on terminal?
                            
                                Calculate mean across dimension in a 2D array
                            
                                What is the difference between subprocess.popen and subprocess.run
                            
                                pretty-print json in python (pythonic way)
                            
                                sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings
                            
                                How to config nltk data directory from code?
                            
                                How to import a module in Python with importlib.import_module
                            
                                Default arguments with *args and **kwargs
                            
                                set random seed programwide in python
                            
                                In Flask, what is "request.args" and how is it used?
                            
                                python JSON only get keys in first level
                            
                                OpenCV & Python - Image too big to display
                            
                                How do I compare two strings in python?
                            
                                Accessing a class' member variables in Python?
                            
                                Looking for a good Python Tree data structure [closed]
                            
                                How do I verify that a string only contains letters, numbers, underscores and dashes?
                            
                                How to convert a decimal number into fraction?
                            
                                How does one convert a grayscale image to RGB in OpenCV (Python)?
                            
                                Why are there no sorted containers in Python's standard libraries?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With