Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add legends and title to grouped histograms generated by Pandas

I am trying to plot a histogram of multiple attributes grouped by another attributes, all of them in a dataframe.

with the help of this question, I am able to set title for the plot. Is there an easy way to switch on legend for each subplot.

Here is my code

import numpy as np
from numpy.random import randn,randint
import pandas as pd
from pandas import DataFrame
import pylab as pl

x=DataFrame(randn(100).reshape(20,5),columns=list('abcde'))
x['new']=pd.Series(randint(0,3,10))
x.hist(by='new')
pl.suptitle('hist by new')

enter image description here

like image 784
vumaasha Avatar asked Jun 18 '15 15:06

vumaasha


People also ask

How do I add a legend to a Pandas DataFrame?

Make a data frame using DataFrame (d). Plot the data frame with a list of styles. Using legend(), place a legend on the figure. The bbox_to_anchor keyword gives a great degree of control for manual legend placement.

How do you add a title to a histogram?

You can change the title of the histogram by adding main as an argument to hist() function. In this case, you make a histogram of the AirPassengers data set with the title “Histogram for Air Passengers”: If you want to adjust the label of the x-axis, add xlab .


1 Answers

You can almost get what you want by doing:

g.plot(kind='bar')

but it produces one plot per group (and doesn't name the plots after the groups so it's a bit useless IMO.)

Here's something which looks rather beautiful, but does involve quite a lot of "manual" matplotlib work, which everyone wants to avoid, but no one can:

import numpy.random as rnd
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm

x = pd.DataFrame(rnd.randn(100).reshape(20, 5), columns=list('abcde'))

group_col = 'groups'
groups = ['foo', 'bar', 'baz']
x[group_col] = pd.Series(rnd.choice(groups, len(x)))

g = x.groupby(group_col)
num_groups = g.ngroups

fig, axes = plt.subplots(num_groups)
for i, (k, group) in enumerate(g):
    ax = axes[i]
    ax.set_title(k)
    group = group[[c for c in group.columns if c != group_col]]
    num_columns = len(group.columns)
    colours = cm.Spectral([float(x) / num_columns for x in range(num_columns)])
    ax.hist(group.values, 5, histtype='bar',
            label=list(group.columns), color=colours,
            linewidth=1, edgecolor='white')
    ax.legend()

plt.show()

Which I think gives you what you want: Beautiful histogram


Update In response to comments (and as this answer is a few years old) I've tried to strip this answer down to its barest bones. There may now be a way of labelling plots of groupby objects but I don't know of it.

Here's the simplest possible way to do this:

axes = g.plot(kind='hist')
for i, (groupname, group) in enumerate(g):
    axes[i].set_title(groupname)
like image 180
LondonRob Avatar answered Sep 19 '22 22:09

LondonRob