I have the following pandas dataframe: <pre class="prettyprint"><code>import pandas as pd import numpy as np df = pd.DataFrame({"shops": ["shop1", "shop2", "shop3", "shop4", "shop5", "shop6"], "franchise" : ["franchise_A", "franchise_A", "franchise_A", "franchise_A", "franchise_B", "franchise_B"],"items" : ["dog", "cat", "dog", "dog", "bird", "fish"]}) df = df[["shops", "franchise", "items"]] print(df) shops franchise items 0 shop1 franchise_A dog 1 shop2 franchise_A cat 2 shop3 franchise_A dog 3 shop4 franchise_A dog 4 shop5 franchise_B bird 5 shop6 franchise_B fish </code></pre> So, each row is a unique sample <code>shop1</code>, <code>shop2</code>, etc. whereby each sample belongs to a subgroup <code>franchise_A</code>, <code>franchise_B</code>, <code>franchise_C</code>, etc. In the <code>items</code> column, there are only four categorical values possible: <code>dog</code>, <code>cat</code>, <code>fish</code>, <code>bird</code>. My motivation is to create a barplot of the number of <code>dog</code>, <code>cat</code>, <code>fish</code>, <code>bird</code> for each "franchise". I would like the output to be <pre class="prettyprint"><code>franchise dogs cats birds fish franchise_A 3 1 0 0 franchise_B 0 0 1 1 </code></pre> I believe I first have to use <code>groupby()</code>, e.g. <pre class="prettyprint"><code>df.groupby("franchise").count() shops items franchise franchise_A 4 4 franchise_B 2 2 </code></pre> But I'm not sure how I count the number of items for each franchise.

You can use <code>value_counts</code> with <code>unstack</code>, thanks Nickil Maveli: <pre class="prettyprint"><code>from collections import Counter print (df.groupby("franchise")['items'].value_counts().unstack(fill_value=0)) items bird cat dog fish franchise franchise_A 0 1 3 0 franchise_B 1 0 0 1 </code></pre> Another solutions with <code>crosstab</code> and <code>pivot_table</code>: <pre class="prettyprint"><code>print (pd.crosstab(df["franchise"], df['items'])) items bird cat dog fish franchise franchise_A 0 1 3 0 franchise_B 1 0 0 1 </code></pre> <hr> <pre class="prettyprint"><code>print (df.pivot_table(index="franchise", columns='items', aggfunc='size', fill_value=0)) items bird cat dog fish franchise franchise_A 0 1 3 0 franchise_B 1 0 0 1 </code></pre>

You could include the <code>items</code> column in the <code>groupby</code>, then use <code>size</code>. <pre class="prettyprint"><code>>>> df.groupby(['franchise', 'items']).size().unstack(fill_value=0) items bird cat dog fish franchise franchise_A 0 1 3 0 franchise_B 1 0 0 1 </code></pre> <hr> (Rough) Benchmark <pre class="prettyprint"><code>%timeit df.groupby(['franchise', 'items']).size().unstack(fill_value=0) 100 loops, best of 3: 2.73 ms per loop %timeit (df.groupby("franchise")['items'].apply(Counter).unstack(fill_value=0).astype(int)) 100 loops, best of 3: 4.18 ms per loop %timeit df.groupby('franchise')['items'].value_counts().unstack(fill_value=0) 100 loops, best of 3: 2.71 ms per loop</code></pre>

How to count subgroups of categorical data in a pandas Dataframe?

Tags:

python

pandas

dataframe

I have the following pandas dataframe:

import pandas as pd
import numpy as np
df = pd.DataFrame({"shops": ["shop1", "shop2", "shop3", "shop4", "shop5", "shop6"], "franchise" : ["franchise_A", "franchise_A", "franchise_A", "franchise_A", "franchise_B", "franchise_B"],"items" : ["dog", "cat", "dog", "dog", "bird", "fish"]})
df = df[["shops", "franchise", "items"]]
print(df)

   shops    franchise items
0  shop1  franchise_A   dog
1  shop2  franchise_A   cat
2  shop3  franchise_A   dog
3  shop4  franchise_A   dog
4  shop5  franchise_B  bird
5  shop6  franchise_B  fish

So, each row is a unique sample shop1, shop2, etc. whereby each sample belongs to a subgroup franchise_A, franchise_B, franchise_C, etc. In the items column, there are only four categorical values possible: dog, cat, fish, bird. My motivation is to create a barplot of the number of dog, cat, fish, bird for each "franchise".

I would like the output to be

franchise        dogs    cats    birds    fish
franchise_A      3       1       0        0
franchise_B      0       0       1        1

I believe I first have to use groupby(), e.g.

df.groupby("franchise").count()
             shops  items
franchise                
franchise_A      4      4
franchise_B      2      2

But I'm not sure how I count the number of items for each franchise.

661

asked Mar 02 '17 18:03

ShanZhengYang

2 Answers

You can use value_counts with unstack, thanks Nickil Maveli:

from collections import Counter

print (df.groupby("franchise")['items'].value_counts().unstack(fill_value=0))
items        bird  cat  dog  fish
franchise                        
franchise_A     0    1    3     0
franchise_B     1    0    0     1

Another solutions with crosstab and pivot_table:

print (pd.crosstab(df["franchise"], df['items']))
items        bird  cat  dog  fish
franchise                        
franchise_A     0    1    3     0
franchise_B     1    0    0     1

print (df.pivot_table(index="franchise", columns='items', aggfunc='size', fill_value=0))
items        bird  cat  dog  fish
franchise                        
franchise_A     0    1    3     0
franchise_B     1    0    0     1

100

answered Sep 29 '22 08:09

jezrael

You could include the items column in the groupby, then use size.

>>> df.groupby(['franchise', 'items']).size().unstack(fill_value=0)

items        bird  cat  dog  fish
franchise                        
franchise_A     0    1    3     0
franchise_B     1    0    0     1

(Rough) Benchmark

%timeit df.groupby(['franchise', 'items']).size().unstack(fill_value=0)
100 loops, best of 3: 2.73 ms per loop

%timeit (df.groupby("franchise")['items'].apply(Counter).unstack(fill_value=0).astype(int))
100 loops, best of 3: 4.18 ms per loop

%timeit df.groupby('franchise')['items'].value_counts().unstack(fill_value=0)
100 loops, best of 3: 2.71 ms per loop

answered Sep 29 '22 08:09

miradulo

Related questions
                            
                                Python bcrypt package on Heroku gives AttributeError: 'module' object has no attribute 'ffi'
                            
                                Different class instances use same memory location
                            
                                Do not internationalize/translate URLs
                            
                                How to update a file using PyGithub?
                            
                                class Image has no attribute 'fromarray'
                            
                                How to count distinct values in a combination of columns while grouping by in pandas?
                            
                                Matrix Mirroring in python
                            
                                Python - Filter list of dictionaries based on multiple keys
                            
                                how to give priority for a regex pattern over another
                            
                                How to update password in django?
                            
                                Python Pandas: differences between two dates in weeks?
                            
                                BeautifulSoup `find_all` generator
                            
                                Regular expression for matching non-whitespace in Python
                            
                                Check Field Exist in Pymongo
                            
                                UTF-8 string in python 2 and 3
                            
                                Computing average of non-zero values
                            
                                cxfreeze aiohttp cannot import compat
                            
                                In Matplotlib, How to avoid axvspan overlap?
                            
                                Flask style.css not loading from static/css/style.css
                            
                                Pass a template string to a Jinja macro

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With