Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting stacked histograms in python using matplotlib

I am trying to plot some data effectively so I can visualise it but I am having some trouble. I have two values. One is discrete (0 or 1) and called label. The other is a continuous value anywhere between 0 and 1. I wish to create a histogram, where on the X axis there would be numerous bars, for example one for every .25 of data, so four bars, where the first has the value of 0-0.25, the second 0.25-0.5, third 0.5-0.75 and fourth 0.75-1.

The y axis will then be split up by whether label is a 1 or a 0, so we end up with a graph like this :

Please excuse the poor paint image!

If there is any effective, intelligent ways to split up my data (rather than just having four bars hardcoded for these values) I would be interested in this too, though that probably warrants another question. I will post it when I have code from this running.

I have both values stored in numpy arrays as follows, but am unsure how to plot a graphs like this :

import numpy as np
import pylab as P

variable_values = trainData.get_vector('variable') #returns one dimensional numpy array of vals
label_values = trainData.get_vector('label')
x = alchemy_category_score_values[alchemy_category_score_values != '?'].astype(float) #removing void vals
y = label_values[alchemy_category_score_values != '?'].astype(float)

fig = plt.figure()

plt.title("Feature breakdown histogram")
plt.xlabel("Variable")
plt.xlim(0, 1)
plt.ylabel("Label")
plt.ylim(0, 1)
xvals = np.linspace(0,1,.02)

plt.show()

The matplotlib tutorial shows the following code to roughly achieve what I want, but I can't really understand how it works (LINK) :

P.figure()

n, bins, patches = P.hist(x, 10, normed=1, histtype='bar', stacked=True)

P.show()

Any help is greatly appreciated. Thank you.

Edit :

I am now getting the error :

AssertionError: incompatible sizes: argument 'height' must be length 5 or scalar

I have printed my two numpy arrays and they are of equal length, one is discrete, the other continuous. Here is the code I am running :

x = variable_values[variable_values != '?'].astype(float)
y = label_values[label_values != '?'].astype(float)

print x #printing numpy arrays of equal size, x is continuous, y is discrete. Both of type float now.
print y

N = 5
ind = np.arange(N)    # the x locations for the groups
width = 0.45       # the width of the bars: can also be len(x) sequence

p1 = plt.bar(ind, y,   width, color='r') #error occurs here
p2 = plt.bar(ind, x, width, color='y',
             bottom=x)

plt.ylabel('Scores')
plt.title('Scores by group and gender')
plt.xticks(ind+width/2., ('G1', 'G2', 'G3', 'G4', 'G5') )
plt.yticks(np.arange(0,81,10))
plt.legend( (p1[0], p2[0]), ('Men', 'Women') )

plt.show()
like image 266
Simon Kiely Avatar asked Dec 01 '25 03:12

Simon Kiely


1 Answers

I think this other tutorial from the same Matplotlib gallery will be much more revealing to you ...

Notice that the second series of data has an extra argument in the call: bottom

p1 = plt.bar(ind, menMeans,   width, color='r', yerr=womenStd)
p2 = plt.bar(ind, womenMeans, width, color='y',
             bottom=menMeans, yerr=menStd)

Just replace menMeans with x and womenMeans with y.

like image 63
logc Avatar answered Dec 02 '25 19:12

logc



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!