I have the following pandas dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({"shops": ["shop1", "shop2", "shop3", "shop4", "shop5", "shop6"], "franchise" : ["franchise_A", "franchise_A", "franchise_A", "franchise_A", "franchise_B", "franchise_B"],"items" : ["dog", "cat", "dog", "dog", "bird", "fish"]})
df = df[["shops", "franchise", "items"]]
print(df)
   shops    franchise items
0  shop1  franchise_A   dog
1  shop2  franchise_A   cat
2  shop3  franchise_A   dog
3  shop4  franchise_A   dog
4  shop5  franchise_B  bird
5  shop6  franchise_B  fish
So, each row is a unique sample shop1, shop2, etc. whereby each sample belongs to a subgroup franchise_A, franchise_B, franchise_C, etc. 
In the items column, there are only four categorical values possible: dog, cat, fish, bird. My motivation is to create a barplot of the number of dog, cat, fish, bird for each "franchise". 
I would like the output to be
franchise        dogs    cats    birds    fish
franchise_A      3       1       0        0
franchise_B      0       0       1        1
I believe I first have to use groupby(), e.g.
df.groupby("franchise").count()
             shops  items
franchise                
franchise_A      4      4
franchise_B      2      2
But I'm not sure how I count the number of items for each franchise.
Pandas value_counts() can get counts of unique values of columns in a Pandas dataframe. Starting from Pandas version 1.1. 0, we can use value_counts() on a Pandas Series and dataframe as well.
To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.
When we have two categorical variables then each of them is likely to have different number of rows for the other variable. This helps us to understand the combinatorial values of those two categorical variables. We can find such type of rows using count function of dplyr package.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
You can use value_counts with unstack, thanks Nickil Maveli:
from collections import Counter
print (df.groupby("franchise")['items'].value_counts().unstack(fill_value=0))
items        bird  cat  dog  fish
franchise                        
franchise_A     0    1    3     0
franchise_B     1    0    0     1
Another solutions with crosstab and pivot_table:
print (pd.crosstab(df["franchise"], df['items']))
items        bird  cat  dog  fish
franchise                        
franchise_A     0    1    3     0
franchise_B     1    0    0     1
print (df.pivot_table(index="franchise", columns='items', aggfunc='size', fill_value=0))
items        bird  cat  dog  fish
franchise                        
franchise_A     0    1    3     0
franchise_B     1    0    0     1
                        You could include the items column in the groupby, then use size. 
>>> df.groupby(['franchise', 'items']).size().unstack(fill_value=0)
items        bird  cat  dog  fish
franchise                        
franchise_A     0    1    3     0
franchise_B     1    0    0     1
(Rough) Benchmark
%timeit df.groupby(['franchise', 'items']).size().unstack(fill_value=0)
100 loops, best of 3: 2.73 ms per loop
%timeit (df.groupby("franchise")['items'].apply(Counter).unstack(fill_value=0).astype(int))
100 loops, best of 3: 4.18 ms per loop
%timeit df.groupby('franchise')['items'].value_counts().unstack(fill_value=0)
100 loops, best of 3: 2.71 ms per loop
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With