Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

create a heatmap of two categorical variables

I have the following datasets of three variables:

  1. df['Score'] Float dummy (1 or 0)
  2. df['Province'] an object column where each row is a region
  3. df['Product type'] an object indicating the industry.

I would like to create a jointplot where on the x axis I have the different industries, on the y axis the different provinces and as colours of my jointplot I have the relative frequency of the score. Something like this. https://seaborn.pydata.org/examples/hexbin_marginals.html

For the time being, I could only do the following

mean = df.groupby(['Province', 'Product type'])['score'].mean()

But i am not sure how to plot it.

Thanks!

like image 932
Filippo Sebastio Avatar asked Mar 04 '23 07:03

Filippo Sebastio


1 Answers

If you are looking for a heatmap, you could use seaborn heatmap function. However you need to pivot your table first.

Just creating a small example:

import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

score = [1, 1, 1, 0, 1, 0, 0, 0]
provinces = ['Place1' ,'Place2' ,'Place2', 'Place3','Place1', 'Place2','Place3','Place1']
products = ['Product1' ,'Product3' ,'Product2', 'Product2','Product1', 'Product2','Product1','Product1']
df = pd.DataFrame({'Province': provinces,
                   'Product type': products,
                   'score': score
                  })

My df looks like:

   'Province''Product type''score'
0   Place1    Product1      1
1   Place2    Product3      1
2   Place2    Product2      1
3   Place3    Product2      0
4   Place1    Product1      1
5   Place2    Product2      0
6   Place3    Product1      0
7   Place1    Product1      0

Then:

df_heatmap = df.pivot_table(values='score',index='Province',columns='Product type',aggfunc=np.mean)
sns.heatmap(df_heatmap,annot=True)
plt.show()

The result is:

like image 95
vmouffron Avatar answered Mar 19 '23 23:03

vmouffron