For a dataframe
import pandas as pd
df=pd.DataFrame({'group':list("AADABCBCCCD"),'Values':[1,0,1,0,1,0,0,1,0,1,0]})
I am trying to plot a barplot showing percentage of times A, B, C, D
takes zero (or one).
I have a round about way which works but I am thinking there has to be more straight forward way
tempdf=df.groupby(['group','Values']).Values.count().unstack().fillna(0)
tempdf['total']=df['group'].value_counts()
tempdf['percent']=tempdf[0]/tempdf['total']*100
tempdf.reset_index(inplace=True)
print tempdf
sns.barplot(x='group',y='percent',data=tempdf)
If it were plotting just the mean value, I could simply do sns.barplot
on df
dataframe than tempdf. I am not sure how to do it elegantly if I am interested in plotting percentages.
Thanks,
You can use Pandas in conjunction with seaborn to make this easier:
import pandas as pd
import seaborn as sns
df = sns.load_dataset("tips")
x, y, hue = "day", "proportion", "sex"
hue_order = ["Male", "Female"]
(df[x]
.groupby(df[hue])
.value_counts(normalize=True)
.rename(y)
.reset_index()
.pipe((sns.barplot, "data"), x=x, y=y, hue=hue))
You can use the library Dexplot, which has the ability to return relative frequencies for categorical variables. It has a similar API to Seaborn. Pass the column you would like to get the relative frequency for to the count
function. If you would like to subdivide this by another column, do so with the split
parameter. The following returns raw counts.
import dexplot as dxp
dxp.count('group', data=df, split='Values')
To get the relative frequencies, set the normalize
parameter to the column you want to normalize over. Use True
to normalize over the overall total count.
dxp.count('group', data=df, split='Values', normalize='group')
Normalizing over the 'Values'
column would produce the following graph, where the total of all the '0' bars are 1.
dxp.count('group', data=df, split='Values', normalize='Values')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With