pandas: drop duplicates in groupby 'date'

Tags:

In the dataframe below, I would like to eliminate the duplicate cid values so the output from df.groupby('date').cid.size() matches the output from df.groupby('date').cid.nunique().

I have looked at this post but it does not seem to have a solid solution to the problem.

df = pd.read_csv('https://raw.githubusercontent.com/108michael/ms_thesis/master/crsp.dime.mpl.df')

df.groupby('date').cid.size()

date
2005       7
2006     237
2007    3610
2008    1318
2009    2664
2010     997
2011    6390
2012    2904
2013    7875
2014    3979

df.groupby('date').cid.nunique()

date
2005      3
2006     10
2007    227
2008     52
2009    142
2010     57
2011    219
2012     99
2013    238
2014    146
Name: cid, dtype: int64

Things I tried:

df.groupby([df['date']]).drop_duplicates(cols='cid') gives this error: AttributeError: Cannot access callable attribute 'drop_duplicates' of 'DataFrameGroupBy' objects, try using the 'apply' method
df.groupby(('date').drop_duplicates('cid')) gives this error: AttributeError: 'str' object has no attribute 'drop_duplicates'

722

asked May 08 '16 22:05

Collective Action

1 Answers

You don't need groupby to drop duplicates based on a few columns, you can specify a subset instead:

df2 = df.drop_duplicates(["date", "cid"])
df2.groupby('date').cid.size()
Out[99]: 
date
2005      3
2006     10
2007    227
2008     52
2009    142
2010     57
2011    219
2012     99
2013    238
2014    146
dtype: int64

177

answered Oct 04 '22 15:10

ayhan

Related questions
                            
                                my matplotlib title gets cropped
                            
                                financial python library that has xirr and xnpv function?
                            
                                wxPython WebView example
                            
                                Uncaught ReferenceError: django is not defined
                            
                                How to deal with unicode string in URL in python3?
                            
                                pyodbc.connect timeout argument is ignored for calls to SQL Server
                            
                                Is Brython entirely client-side?
                            
                                Should I force Python type checking?
                            
                                Python: re.compile and re.sub
                            
                                What happens when a function returns its own name in python?
                            
                                Seaborn implot with equation and R2 text
                            
                                Can't connect to S3 buckets with periods in their name, when using Boto on Heroku
                            
                                Matplotlib box plot fliers not showing
                            
                                Collapse multiple submodules to one Cython extension
                            
                                ImportError: No module named cryptography.hazmat.backends - boxsdk on Mac
                            
                                How to get Top 3 or Top N predictions using sklearn's SGDClassifier
                            
                                ValueError: malformed string using ast.literal_eval
                            
                                format r(repr) of print in python3
                            
                                How to convert generator object into list? [duplicate]
                            
                                Order in legend plots python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas: drop duplicates in groupby 'date'

Tags:

python

pandas

duplicates

unique

pandas-groupby

Collective Action

People also ask

1 Answers

ayhan

Recent Activity

Donate For Us