Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create barplot from string data using groupby and multiple columns in pandas dataframe

I'd like to make a bar plot in python with multiple x-categories from counts of data either "yes" or "no". I've started on some code but I believe the track I'm on in a slow way of getting to the solution I want. I'd be fine with a solution that uses either seaborn, Matplotlib, or pandas but not Bokeh because I'd like to make publication-quality figures that scale.

Ultimately what I want is:

  • bar plot with the categories "canoe", "cruise", "kayak" and "ship" on the x-axis
  • grouped-by "color", so either Green or Red
  • showing the proportion of "yes" responses: so number of yes rows divided by the count of "red" and "greens" which in this case is 4 red and 4 green, but that could change.

Here's the dataset I'm working with:

import pandas as pd
data = [{'ship': 'Yes','canoe': 'Yes', 'cruise': 'Yes', 'kayak': 'No','color': 'Red'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'Yes','canoe': 'No','color': 'Green'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'No','canoe': 'No','color': 'Green'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'No','canoe': 'No','color': 'Red'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'Yes','canoe': 'No','color': 'Red'},{'ship': 'No', 'cruise': 'Yes', 'kayak': 'No','canoe': 'Yes','color': 'Green'},{'ship': 'No', 'cruise': 'No', 'kayak': 'No','canoe': 'No','color': 'Green'},{'ship': 'No', 'cruise': 'No', 'kayak': 'No','canoe': 'No','color': 'Red'}]
df = pd.DataFrame(data)

This is what I've started with:

print(df['color'].value_counts())

red = 4 # there must be a better way to code this rather than manually. Perhaps using len()?
green = 4

# get count per type
ca = df['canoe'].value_counts()
cr = df['cruise'].value_counts()
ka = df['kayak'].value_counts()
sh = df['ship'].value_counts()
print(ca, cr, ka, sh)

# group by color
cac = df.groupby(['canoe','color'])
crc = df.groupby(['cruise','color'])
kac = df.groupby(['kayak','color'])
shc = df.groupby(['ship','color'])

# make plots 
cac2 = cac['color'].value_counts().unstack()
cac2.plot(kind='bar', title = 'Canoe by color')

enter image description here

But really what I want is all of the x-categories to be on one plot, only showing the result for "Yes" responses, and taken as the proportion of "Yes" rather than just counts. Help?

like image 647
JAG2024 Avatar asked Jul 26 '18 06:07

JAG2024


People also ask

How to create a bar plot from a groupby function in pandas?

You can use the following syntax to create a bar plot from a GroupBy function in pandas: #calculate sum of values by group df_groups = df.groupby( ['group_var']) ['values_var'].sum() #create bar plot by group df_groups.plot(kind='bar') The following example shows how to use this syntax in practice.

How to plot multiple columns of pandas Dataframe on bar chart?

Plot Multiple Columns of Pandas Dataframe on Bar Chart with Matplotlib. 1 Python3. import pandas as pd. import matplotlib.pyplot as plt. df = pd.DataFrame ( {. 'Name': ['John', 'Sammy', 'Joe'], 'Age': [45, 38, 90], 'Height ... 2 Python3. 3 Python3.

How do you make a bar plot from a Dataframe?

We will use the DataFrame df to construct bar plots. We need to plot age, height, and weight for each person in the DataFrame on a single bar chart. It generates a bar chart for Age, Height and Weight for each person in the dataframe df using the plot () method for the df object.

How to create a bar plot by group in Matplotlib?

import matplotlib.pyplot as plt #calculate sum of points for each team df.groupby('team') ['points'].sum() #create bar plot by group df_groups.plot(kind='bar') The x-axis shows the name of each team and the y-axis shows the sum of the points scored by each team. Note: You can find the complete documentation for the GroupBy function here.


1 Answers

Not exactly sure if I understand the question correctly. It looks like it would make more sense to look at the proportion of answers per boat type and color.

import matplotlib.pyplot as plt
import pandas as pd
data = [{'ship': 'Yes','canoe': 'Yes', 'cruise': 'Yes', 'kayak': 'No','color': 'Red'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'Yes','canoe': 'No','color': 'Green'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'No','canoe': 'No','color': 'Green'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'No','canoe': 'No','color': 'Red'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'Yes','canoe': 'No','color': 'Red'},{'ship': 'No', 'cruise': 'Yes', 'kayak': 'No','canoe': 'Yes','color': 'Green'},{'ship': 'No', 'cruise': 'No', 'kayak': 'No','canoe': 'No','color': 'Green'},{'ship': 'No', 'cruise': 'No', 'kayak': 'No','canoe': 'No','color': 'Red'}]
df = pd.DataFrame(data)

ax = df.replace(["Yes","No"],[1,0]).groupby("color").mean().transpose().plot.bar(color=["g","r"])
ax.set_title('Proportion "Yes" answers per of boat type and color')
plt.show()

enter image description here

This means e.g. that 25% of all green canoes answered "yes".

like image 146
ImportanceOfBeingErnest Avatar answered Oct 13 '22 13:10

ImportanceOfBeingErnest