I have a table containing dates and the various cars sold on each dates in the following format (These are only 2 of many columns): <pre class="prettyprint"><code>DATE CAR 2012/01/01 BMW 2012/01/01 Mercedes Benz 2012/01/01 BMW 2012/01/02 Volvo 2012/01/02 BMW 2012/01/03 Mercedes Benz ... 2012/09/01 BMW 2012/09/02 Volvo </code></pre> I perform the following operation to find the number of BMW cars sold everyday <pre class="prettyprint"><code>df[df.CAR=='BMW']['DATE'].value_counts() </code></pre> The result is something like this : <pre class="prettyprint"><code>2012/07/04 15 2012/07/08 8 ... 2012/01/02 1 </code></pre> But there are some days when no BMW car was sold. In the result, along with the above I want the days where there are zero occurrences of BMW. Therefore, the desired result is : <pre class="prettyprint"><code>2012/07/04 15 2012/07/08 8 ... 2012/01/02 1 2012/01/09 0 2012/08/11 0 </code></pre> What can I do to attain such a result?

You can reindex the result after <code>value_counts</code> and fill the missing values with 0. <pre class="prettyprint"><code>df.loc[df.CAR == 'BMW', 'DATE'].value_counts().reindex( df.DATE.unique(), fill_value=0) </code></pre> Output: <pre class="prettyprint"><code>2012/01/01 2 2012/01/02 1 2012/01/03 0 2012/09/01 1 2012/09/02 0 Name: DATE, dtype: int64 </code></pre> <hr> Instead of <code>value_counts</code> you could also consider checking the equality and summing, grouped by the dates, which will include all of them. <pre class="prettyprint"><code>df['CAR'].eq('BMW').astype(int).groupby(df['DATE']).sum() </code></pre> Output: <pre class="prettyprint"><code>DATE 2012/01/01 2 2012/01/02 1 2012/01/03 0 2012/09/01 1 2012/09/02 0 Name: CAR, dtype: int32 </code></pre>

The default behavior of type <code>category</code> is exactly what you want. The non present categories will display with a value of zero. You just need to do: <pre class="prettyprint"><code>df.astype({'CAR': 'category'})[df.CAR=='BMW']['DATE'].value_counts() </code></pre> or better yet, make it definitively a category in your dataframe: <pre class="prettyprint"><code>df.CAR = df.CAR.astype('category') df[df.CAR=='BMW'].DATE.value_counts() </code></pre> The category type is a better representation of your data and more space-efficient.

Zero occurrences/frequency using value_counts() in PANDAS

Tags:

python

pandas

numpy

I have a table containing dates and the various cars sold on each dates in the following format (These are only 2 of many columns):

DATE       CAR
2012/01/01 BMW
2012/01/01 Mercedes Benz
2012/01/01 BMW
2012/01/02 Volvo
2012/01/02 BMW
2012/01/03 Mercedes Benz
...
2012/09/01 BMW
2012/09/02 Volvo

I perform the following operation to find the number of BMW cars sold everyday

df[df.CAR=='BMW']['DATE'].value_counts()

The result is something like this :

2012/07/04 15
2012/07/08 8
...
2012/01/02 1

But there are some days when no BMW car was sold. In the result, along with the above I want the days where there are zero occurrences of BMW. Therefore, the desired result is :

2012/07/04 15
2012/07/08 8
...
2012/01/02 1
2012/01/09 0
2012/08/11 0

What can I do to attain such a result?

684

asked Jul 25 '18 13:07

Babaji

Video Answer

2 Answers

You can reindex the result after value_counts and fill the missing values with 0.

df.loc[df.CAR == 'BMW', 'DATE'].value_counts().reindex(
    df.DATE.unique(), fill_value=0)

Output:

2012/01/01    2
2012/01/02    1
2012/01/03    0
2012/09/01    1
2012/09/02    0
Name: DATE, dtype: int64

Instead of value_counts you could also consider checking the equality and summing, grouped by the dates, which will include all of them.

df['CAR'].eq('BMW').astype(int).groupby(df['DATE']).sum()

Output:

DATE
2012/01/01    2
2012/01/02    1
2012/01/03    0
2012/09/01    1
2012/09/02    0
Name: CAR, dtype: int32

answered Oct 16 '22 21:10

ALollz

The default behavior of type category is exactly what you want. The non present categories will display with a value of zero. You just need to do:

df.astype({'CAR': 'category'})[df.CAR=='BMW']['DATE'].value_counts()

or better yet, make it definitively a category in your dataframe:

df.CAR = df.CAR.astype('category')
df[df.CAR=='BMW'].DATE.value_counts()

The category type is a better representation of your data and more space-efficient.

answered Oct 16 '22 21:10

neves

Related questions
                            
                                Draw Ellipse in Python PIL with line thickness
                            
                                Can variables be decorated? [closed]
                            
                                How to select cells greater than a value in a multi-index Pandas dataframe?
                            
                                Does Seaborn distplot not support a range?
                            
                                Optimization of arithmetic expressions - what is this technique called?
                            
                                How do I include .dll file in executable using pyinstaller?
                            
                                Python dynamic multiprocessing and signalling issues
                            
                                Librosa pitch tracking - STFT
                            
                                Parse pandas (multi)index to datetime
                            
                                Visual Studio - "The environment IronPython|2.7-32 appears to be incorrectly configured or missing"
                            
                                Blob.generate_signed_url() failing to AttributeError
                            
                                Python Unit Test : How to unit test the module which contains database operations?
                            
                                Use keras layer in tensorflow code
                            
                                Python async/await downloading a list of urls
                            
                                How to fix issues with E402?
                            
                                platform.linux_distribution() deprecated - what are the alternatives?
                            
                                Deleting elements of a list based on a condition
                            
                                What does the asterisk in the output of `reveal_type` mean?
                            
                                Are nested format specifications legal?
                            
                                How to schedule a task in asyncio so it runs at a certain date?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With