Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make a bar plot of non-numerical data in pandas

Suppose I had this data:

>>> df = pd.DataFrame(data={"age": [11, 12, 11, 11, 13, 11, 12, 11],
                        "response": ["Yes", "No", "Yes", "Yes", "Yes", "No", "Yes", "Yes"]})
>>> df
    age response
0   11  Yes
1   12  No
2   11  Yes
3   11  Yes
4   13  Yes
5   11  No
6   12  Yes
7   11  Yes

I would like to make a bar plot that shows the yes or no responses aggregated by age. Would it be possible at all? I have tried hist and kind=bar, but neither was able to sort by age, instead graphing both age and response separately.

It would look like this:

  ^
4 |   o
3 |   o
2 |   o
1 |   ox      ox      o
0 .----------------------->
      11      12      13  

where o is "Yes", and x is "No".

Also, would it be possible to make the numbers grouped? If you had a range from 11 to 50, for instance, you might be able to put it in 5-year bins. Also, would it be possible to show percentages or counts on the axis or on the individual bar?

like image 318
Jean Nassar Avatar asked Dec 13 '15 13:12

Jean Nassar


People also ask

How do you plot non numerical data?

Non-numeric data want a bar graph or pie chart; numeric data want a histogram or stemplot. Histograms and bar graphs can show frequency or relative frequency.

Is it possible to plot a graph for non numeric data?

Explanation: Yes, it is possible by using dplyr and ggplot for non numeric data using circular dendrogram.

Can pandas generate graphics plots?

Initializing the Plots ObjectPlotting can be performed in pandas by using the “. plot()” function. This function directly creates the plot for the dataset. This function can also be used in two ways.


2 Answers

To generate a multiple bar plot, you would first need to group by age and response and then unstack the dataframe:

df=df.groupby(['age','response']).size()
df=df.unstack()
df.plot(kind='bar')

Here is the output plot:

Bar plot

like image 104
Learner Avatar answered Nov 15 '22 06:11

Learner


To bin your data, take a look at pandas.cut() see docs. For categorical plots, I've found the seaborns package quite helpful - see the tutorial on categorical plots. Below an example for a plot of the yes/no counts for the bins you mention using a random sample:

df = pd.DataFrame(data={"age": randint(10, 50, 1000),
                    "response": [choice(['Yes', 'No']) for i in range(1000)]})

df['age_group'] = pd.cut(df.age, bins=[g for g in range(10, 51, 5)], include_lowest=True)
df.head()

   age response age_group
0   48      Yes  (45, 50]
1   31       No  (30, 35]
2   25      Yes  (20, 25]
3   29      Yes  (25, 30]
4   19      Yes  (15, 20]

import seaborn as sns
sns.countplot(y='response', hue='age_group', data=df, palette="Greens_d")

enter image description here

like image 35
Stefan Avatar answered Nov 15 '22 05:11

Stefan