I have a table containing dates and the various cars sold on each dates in the following format (These are only 2 of many columns):
DATE CAR
2012/01/01 BMW
2012/01/01 Mercedes Benz
2012/01/01 BMW
2012/01/02 Volvo
2012/01/02 BMW
2012/01/03 Mercedes Benz
...
2012/09/01 BMW
2012/09/02 Volvo
I perform the following operation to find the number of BMW cars sold everyday
df[df.CAR=='BMW']['DATE'].value_counts()
The result is something like this :
2012/07/04 15
2012/07/08 8
...
2012/01/02 1
But there are some days when no BMW car was sold. In the result, along with the above I want the days where there are zero occurrences of BMW. Therefore, the desired result is :
2012/07/04 15
2012/07/08 8
...
2012/01/02 1
2012/01/09 0
2012/08/11 0
What can I do to attain such a result?
Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
Call the value_counts() function on this Series/Column. It will give a new Series containing the occurrence count of each distinct value in the Series/column. Then select the occurrence count of zero from this Series, and it will give the count of zero values in the initially selected column.
count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.
In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.
You can reindex the result after value_counts
and fill the missing values with 0.
df.loc[df.CAR == 'BMW', 'DATE'].value_counts().reindex(
df.DATE.unique(), fill_value=0)
Output:
2012/01/01 2
2012/01/02 1
2012/01/03 0
2012/09/01 1
2012/09/02 0
Name: DATE, dtype: int64
Instead of value_counts
you could also consider checking the equality and summing, grouped by the dates, which will include all of them.
df['CAR'].eq('BMW').astype(int).groupby(df['DATE']).sum()
Output:
DATE
2012/01/01 2
2012/01/02 1
2012/01/03 0
2012/09/01 1
2012/09/02 0
Name: CAR, dtype: int32
The default behavior of type category
is exactly what you want. The non present categories will display with a value of zero. You just need to do:
df.astype({'CAR': 'category'})[df.CAR=='BMW']['DATE'].value_counts()
or better yet, make it definitively a category in your dataframe:
df.CAR = df.CAR.astype('category')
df[df.CAR=='BMW'].DATE.value_counts()
The category type is a better representation of your data and more space-efficient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With