Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to turn groupby() and value_counts() into multiple pie/bar charts

Let's assume I have a dataframe and I'm looking at 2 columns of it (2 series).

Using one of the columns - "no_employees" below - Can someone kindly help me figure out how to create 6 different pie charts or bar charts (1 for each grouping of no_employees) that illustrate the value counts for the Yes/No values in the treatment column? I'll use matplotlib or seaborn, whatever you feel is easiest.

I'm using the attached line of code to generate the code below.

dataframe_title.groupby(['no_employees']).treatment.value_counts(). 

But now I'm stuck. Do I use seaborn? .plot? This seems like it should be easy, and I know there are some cases where I can make subplots=True, but I'm really confused. Thank you so much.

no_employees    treatment
1-5             Yes           88
                No            71
100-500         Yes           95
                No            80
26-100          Yes          149
                No           139
500-1000        No            33
                Yes           27
6-25            No           162
                Yes          127
More than 1000  Yes          146
                No           135
like image 466
dbs5 Avatar asked Aug 23 '19 00:08

dbs5


People also ask

How to group items by type in a pie chart?

Then you can group by the type using the formula below: When you add a pie chart using 'coll2' in the Items property, you should get the chart that you described. The formula above first groups all the items in the original collection by the 'type' column, then adds a new column that sums the 'value' property of each group.

How to show data value in a pie chart in Excel?

Now click on the 2-D Pie Chart command, which is marked with a red color rectangle. The above data set shows this pie chart. From the Chart Element option, click on the Data Labels. These are the given results showing the data value in a pie chart. Right-click on the pie chart. Select the Format Data Labels command.

How to create a pie chart of series values in Python?

Plot Pie Chart of Series Values To create a pie chart from the series values we’ll pass kind='pie' to the pandas series plot () function. For example, let’s see its usage on the “wimbledon_wins_count” series created above. The above pie chart shows the distribution of Wimbledon victories from 2015 to 2019.

How do you make a pie chart in pandas?

Pandas Series as Pie Chart To plot a pie chart, you first need to create a series of counts of each unique value (use the pandas value_counts () function) and then proceed to plot the resulting series of counts as a pie chart using the pandas series plot () function.


1 Answers

The importance of data encoding:

  1. The purpose of data visualization is to more easily convey information (e.g. in this case, the relative number of 'treatments' per category)
  2. The bar chart accommodates easily displaying the important information
    • how many in each group said 'Yes' or 'No'
    • the relative sizes of each group
  3. A pie plot is more commonly used to display a sample, where the groups within the sample, sum to 100%.
    • Wikipedia: Pie Chart
      • Research has shown that comparison by angle, is less accurate than comparison by length, in that people are less able to discern differences.
      • Statisticians generally regard pie charts as a poor method of displaying information, and they are uncommon in scientific literature.
    • This data is not well represented by a pie plot, because each company size is a separate population, which will require 6 pie plots to be correctly represented.
    • The data can be placed into a pie plot, as others have shown, but that doesn't mean it should be.
  • Regardless of the type of plot, the data must be in the correct shape for the plot API.
  • Tested with pandas 1.3.0, seaborn 0.11.1, and matplotlib 3.4.2

Setup a test DataFrame

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np  # for sample data only

np.random.seed(365)
cats = ['1-5', '6-25', '26-100', '100-500', '500-1000', '>1000']

data = {'no_employees': np.random.choice(cats, size=(1000,)),
        'treatment': np.random.choice(['Yes', 'No'], size=(1000,))}

df = pd.DataFrame(data)

# set a categorical order for the x-axis to be ordered
df.no_employees = pd.Categorical(df.no_employees, categories=cats, ordered=True)

  no_employees treatment
0       26-100        No
1          1-5       Yes
2        >1000        No
3      100-500       Yes
4     500-1000       Yes

Plotting with pandas.DataFrame.plot():

  • This requires grouping the dataframe to get .value_counts, and unstacking with pandas.DataFrame.unstack.
# to get the dataframe in the correct shape, unstack the groupby result
dfu = df.groupby(['no_employees']).treatment.value_counts().unstack()

treatment     No  Yes
no_employees         
1-5           78   72
6-25          83   86
26-100        83   76
100-500       91   84
500-1000      78   83
>1000         95   91

# plot
ax = dfu.plot(kind='bar', figsize=(7, 5), xlabel='Number of Employees in Company', ylabel='Count', rot=0)
ax.legend(title='treatment', bbox_to_anchor=(1, 1), loc='upper left')

enter image description here


Plotting with seaborn

  • seaborn is a high-level API for matplotlib.

seaborn.barplot()

  • Requires a DataFrame in a tidy (long) format, which is done by grouping the dataframe to get .value_counts, and resetting the index with pandas.Series.reset_index
  • May also be done with the figure-level interface using sns.catplot() with kind='bar'
# groupby, get value_counts, and reset the index
dft = df.groupby(['no_employees']).treatment.value_counts().reset_index(name='Count')

   no_employees treatment  Count
0           1-5        No     78
1           1-5       Yes     72
2          6-25       Yes     86
3          6-25        No     83
4        26-100        No     83
5        26-100       Yes     76
6       100-500        No     91
7       100-500       Yes     84
8      500-1000       Yes     83
9      500-1000        No     78
10        >1000        No     95
11        >1000       Yes     91

# plot
p = sns.barplot(x='no_employees', y='Count', data=dft, hue='treatment')
p.legend(title='treatment', bbox_to_anchor=(1, 1), loc='upper left')
p.set(xlabel='Number of Employees in Company')

seaborn.countplot()

  • Uses the original dataframe, df, without any transformations.
  • May also be done with the figure-level interface using sns.catplot() with kind='count'
p = sns.countplot(data=df, x='no_employees', hue='treatment')
p.legend(title='treatment', bbox_to_anchor=(1, 1), loc='upper left')
p.set(xlabel='Number of Employees in Company')
  • Output of barplot and countplot

enter image description here

like image 185
Trenton McKinney Avatar answered Sep 16 '22 16:09

Trenton McKinney