I have a pandas.DataFrame
with a column called name
containing strings. I would like to get a list of the names which occur more than once in the column. How do I do that?
I tried:
funcs_groups = funcs.groupby(funcs.name) funcs_groups[(funcs_groups.count().name>1)]
But it doesn't filter out the singleton names.
The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
We can use Pandas built-in method drop_duplicates() to drop duplicate rows. Note that we started out as 80 rows, now it's 77. By default, this method returns a new DataFrame with duplicate rows removed. We can set the argument inplace=True to remove duplicates from the original DataFrame.
To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set.
If you want to find the rows with duplicated name (except the first time we see that), you can try this
In [16]: import pandas as pd In [17]: p1 = {'name': 'willy', 'age': 10} In [18]: p2 = {'name': 'willy', 'age': 11} In [19]: p3 = {'name': 'zoe', 'age': 10} In [20]: df = pd.DataFrame([p1, p2, p3]) In [21]: df Out[21]: age name 0 10 willy 1 11 willy 2 10 zoe In [22]: df.duplicated('name') Out[22]: 0 False 1 True 2 False
A one liner can be:
x.set_index('name').index.get_duplicates()
the index contains a method for finding duplicates, columns does not seem to have a similar method..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With