Suppose I had this data:
>>> df = pd.DataFrame(data={"age": [11, 12, 11, 11, 13, 11, 12, 11],
"response": ["Yes", "No", "Yes", "Yes", "Yes", "No", "Yes", "Yes"]})
>>> df
age response
0 11 Yes
1 12 No
2 11 Yes
3 11 Yes
4 13 Yes
5 11 No
6 12 Yes
7 11 Yes
I would like to make a bar plot that shows the yes or no responses aggregated by age. Would it be possible at all? I have tried hist
and kind=bar
, but neither was able to sort by age, instead graphing both age and response separately.
It would look like this:
^
4 | o
3 | o
2 | o
1 | ox ox o
0 .----------------------->
11 12 13
where o
is "Yes", and x
is "No".
Also, would it be possible to make the numbers grouped? If you had a range from 11 to 50, for instance, you might be able to put it in 5-year bins. Also, would it be possible to show percentages or counts on the axis or on the individual bar?
Non-numeric data want a bar graph or pie chart; numeric data want a histogram or stemplot. Histograms and bar graphs can show frequency or relative frequency.
Explanation: Yes, it is possible by using dplyr and ggplot for non numeric data using circular dendrogram.
Initializing the Plots ObjectPlotting can be performed in pandas by using the “. plot()” function. This function directly creates the plot for the dataset. This function can also be used in two ways.
To generate a multiple bar plot, you would first need to group by age and response and then unstack the dataframe:
df=df.groupby(['age','response']).size()
df=df.unstack()
df.plot(kind='bar')
Here is the output plot:
To bin
your data, take a look at pandas.cut()
see docs. For categorical plots, I've found the seaborns
package quite helpful - see the tutorial on categorical plots. Below an example for a plot of the yes/no counts for the bins you mention using a random sample:
df = pd.DataFrame(data={"age": randint(10, 50, 1000),
"response": [choice(['Yes', 'No']) for i in range(1000)]})
df['age_group'] = pd.cut(df.age, bins=[g for g in range(10, 51, 5)], include_lowest=True)
df.head()
age response age_group
0 48 Yes (45, 50]
1 31 No (30, 35]
2 25 Yes (20, 25]
3 29 Yes (25, 30]
4 19 Yes (15, 20]
import seaborn as sns
sns.countplot(y='response', hue='age_group', data=df, palette="Greens_d")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With