I have the following pandas dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({"shops": ["shop1", "shop2", "shop3", "shop4", "shop5", "shop6"], "franchise" : ["franchise_A", "franchise_A", "franchise_A", "franchise_A", "franchise_B", "franchise_B"],"items" : ["dog", "cat", "dog", "dog", "bird", "fish"]})
df = df[["shops", "franchise", "items"]]
print(df)
shops franchise items
0 shop1 franchise_A dog
1 shop2 franchise_A cat
2 shop3 franchise_A dog
3 shop4 franchise_A dog
4 shop5 franchise_B bird
5 shop6 franchise_B fish
So, each row is a unique sample shop1
, shop2
, etc. whereby each sample belongs to a subgroup franchise_A
, franchise_B
, franchise_C
, etc.
In the items
column, there are only four categorical values possible: dog
, cat
, fish
, bird
. My motivation is to create a barplot of the number of dog
, cat
, fish
, bird
for each "franchise".
I would like the output to be
franchise dogs cats birds fish
franchise_A 3 1 0 0
franchise_B 0 0 1 1
I believe I first have to use groupby()
, e.g.
df.groupby("franchise").count()
shops items
franchise
franchise_A 4 4
franchise_B 2 2
But I'm not sure how I count the number of items for each franchise.
Pandas value_counts() can get counts of unique values of columns in a Pandas dataframe. Starting from Pandas version 1.1. 0, we can use value_counts() on a Pandas Series and dataframe as well.
To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.
When we have two categorical variables then each of them is likely to have different number of rows for the other variable. This helps us to understand the combinatorial values of those two categorical variables. We can find such type of rows using count function of dplyr package.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
You can use value_counts
with unstack
, thanks Nickil Maveli:
from collections import Counter
print (df.groupby("franchise")['items'].value_counts().unstack(fill_value=0))
items bird cat dog fish
franchise
franchise_A 0 1 3 0
franchise_B 1 0 0 1
Another solutions with crosstab
and pivot_table
:
print (pd.crosstab(df["franchise"], df['items']))
items bird cat dog fish
franchise
franchise_A 0 1 3 0
franchise_B 1 0 0 1
print (df.pivot_table(index="franchise", columns='items', aggfunc='size', fill_value=0))
items bird cat dog fish
franchise
franchise_A 0 1 3 0
franchise_B 1 0 0 1
You could include the items
column in the groupby
, then use size
.
>>> df.groupby(['franchise', 'items']).size().unstack(fill_value=0)
items bird cat dog fish
franchise
franchise_A 0 1 3 0
franchise_B 1 0 0 1
(Rough) Benchmark
%timeit df.groupby(['franchise', 'items']).size().unstack(fill_value=0)
100 loops, best of 3: 2.73 ms per loop
%timeit (df.groupby("franchise")['items'].apply(Counter).unstack(fill_value=0).astype(int))
100 loops, best of 3: 4.18 ms per loop
%timeit df.groupby('franchise')['items'].value_counts().unstack(fill_value=0)
100 loops, best of 3: 2.71 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With