Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zero occurrences/frequency using value_counts() in PANDAS

I have a table containing dates and the various cars sold on each dates in the following format (These are only 2 of many columns):

DATE       CAR
2012/01/01 BMW
2012/01/01 Mercedes Benz
2012/01/01 BMW
2012/01/02 Volvo
2012/01/02 BMW
2012/01/03 Mercedes Benz
...
2012/09/01 BMW
2012/09/02 Volvo

I perform the following operation to find the number of BMW cars sold everyday

df[df.CAR=='BMW']['DATE'].value_counts()

The result is something like this :

2012/07/04 15
2012/07/08 8
...
2012/01/02 1

But there are some days when no BMW car was sold. In the result, along with the above I want the days where there are zero occurrences of BMW. Therefore, the desired result is :

2012/07/04 15
2012/07/08 8
...
2012/01/02 1
2012/01/09 0
2012/08/11 0

What can I do to attain such a result?

like image 684
Babaji Avatar asked Jul 25 '18 13:07

Babaji


People also ask

What does value_counts () do in pandas?

Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.

How do you count 0 in pandas?

Call the value_counts() function on this Series/Column. It will give a new Series containing the occurrence count of each distinct value in the Series/column. Then select the occurrence count of zero from this Series, and it will give the count of zero values in the initially selected column.

What is the difference between count and value_counts in pandas?

count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.

How do you count the frequency of elements in pandas Dataframe?

In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.


Video Answer


2 Answers

You can reindex the result after value_counts and fill the missing values with 0.

df.loc[df.CAR == 'BMW', 'DATE'].value_counts().reindex(
    df.DATE.unique(), fill_value=0)

Output:

2012/01/01    2
2012/01/02    1
2012/01/03    0
2012/09/01    1
2012/09/02    0
Name: DATE, dtype: int64

Instead of value_counts you could also consider checking the equality and summing, grouped by the dates, which will include all of them.

df['CAR'].eq('BMW').astype(int).groupby(df['DATE']).sum()

Output:

DATE
2012/01/01    2
2012/01/02    1
2012/01/03    0
2012/09/01    1
2012/09/02    0
Name: CAR, dtype: int32
like image 67
ALollz Avatar answered Oct 16 '22 21:10

ALollz


The default behavior of type category is exactly what you want. The non present categories will display with a value of zero. You just need to do:

df.astype({'CAR': 'category'})[df.CAR=='BMW']['DATE'].value_counts()

or better yet, make it definitively a category in your dataframe:

df.CAR = df.CAR.astype('category')
df[df.CAR=='BMW'].DATE.value_counts()

The category type is a better representation of your data and more space-efficient.

like image 1
neves Avatar answered Oct 16 '22 21:10

neves