Pandas: How can I remove duplicate rows from DataFrame and calculate their frequency?

Tags:

I have a created a dataframe:

df1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
                    'year':[2000,2001,1998,1999,1998,1998,2000]})

That is as follows:

    key    year
0    b    2000  
1    b    2001  
2    a    1998  
3    c    1999  
4    a    1998  
5    a    1998  
6    b    2000

I want to get the number of occurrences of each line in the fastest possible way:

key  year    frequency  
b    2000    2  
b    2001    1  
a    1998    3  
c    1999    1

933

asked Feb 04 '14 17:02

Laura

1 Answers

By doing

df1.groupby(['key','year']).size().reset_index()

you get...

  key  year  0
0   a  1998  3
1   b  2000  2
2   b  2001  1
3   c  1999  1

as you see, that column has not been named, so you can do something like

mydf = df1.groupby(['key','year']).size().reset_index()
mydf.rename(columns = {0: 'frequency'}, inplace = True)

mydf

  key  year  frequency
0   a  1998          3
1   b  2000          2
2   b  2001          1
3   c  1999          1

(you can omit the .reset_index() if you want, but in that case you'll need to transform mydf into a dataframe, like so: mydf = pd.DataFrame(mydf), and only then rename the column)

answered Nov 03 '22 03:11

mkln

Related questions
                            
                                Celery Closes Unexpectedly After Longer Inactivity
                            
                                Difference between ALL 1D points in array with python diff()?
                            
                                How to run Flask-Login, Flask-BrowserID and Flask-SQLAlchemy in harmony?
                            
                                Python Change Interface Channel
                            
                                Pandas boxplot x-axis setting
                            
                                routing one rest resource as a child of a second rest resource
                            
                                GAE Search API Now support Partial Searching
                            
                                Python with C libraries
                            
                                Is there a way to implement **kwargs behavior when calling a Python script from the command line
                            
                                Remove whitespaces from beginning and end of each line in a file in python
                            
                                Python: keep track of elements moving within a list
                            
                                python recursive vectorization with timeseries
                            
                                You cannot access body after reading from request's data stream after starting py.test
                            
                                What's actually happening when I convert an int to a string?
                            
                                Django: Mysql-python installation error
                            
                                How to avoid a race condition with makedirs?
                            
                                Identifying implicit string literal concatenation
                            
                                Python generator to yield everything from another generator call
                            
                                How to display current year in Flask template?
                            
                                How do I format positional argument help using Python's optparse?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: How can I remove duplicate rows from DataFrame and calculate their frequency?

Tags:

python

pandas

duplicates

Laura

People also ask

1 Answers

mkln

Recent Activity

Donate For Us