Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the most efficient way of counting occurrences in pandas?

Tags:

python

pandas

I have a large (about 12M rows) DataFrame df:

df.columns = ['word','documents','frequency']

The following ran in a timely fashion:

word_grouping = df[['word','frequency']].groupby('word')
MaxFrequency_perWord = word_grouping[['frequency']].max().reset_index()
MaxFrequency_perWord.columns = ['word','MaxFrequency']

However, this is taking an unexpectedly long time to run:

Occurrences_of_Words = word_grouping[['word']].count().reset_index()

What am I doing wrong here? Is there a better way to count occurrences in a large DataFrame?

df.word.describe()

ran pretty well, so I really did not expect this Occurrences_of_Words DataFrame to take very long to build.

like image 463
tipanverella Avatar asked Nov 19 '13 15:11

tipanverella


People also ask

How do you count occurrences of pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

What is the most efficient way to loop through Dataframes with pandas?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.

How do you get the most frequent value in pandas?

To get the most frequent value of a column we can use the method mode . It will return the value that appears most often. It can be multiple values.


1 Answers

I think df['word'].value_counts() should serve. By skipping the groupby machinery, you'll save some time. I'm not sure why count should be much slower than max. Both take some time to avoid missing values. (Compare with size.)

In any case, value_counts has been specifically optimized to handle object type, like your words, so I doubt you'll do much better than that.

like image 121
Dan Allan Avatar answered Sep 27 '22 23:09

Dan Allan