I am trying to plot some data effectively so I can visualise it but I am having some trouble. I have two values. One is discrete (0 or 1) and called label. The other is a continuous value anywhere between 0 and 1. I wish to create a histogram, where on the X axis there would be numerous bars, for example one for every .25 of data, so four bars, where the first has the value of 0-0.25, the second 0.25-0.5, third 0.5-0.75 and fourth 0.75-1.
The y axis will then be split up by whether label is a 1 or a 0, so we end up with a graph like this :

If there is any effective, intelligent ways to split up my data (rather than just having four bars hardcoded for these values) I would be interested in this too, though that probably warrants another question. I will post it when I have code from this running.
I have both values stored in numpy arrays as follows, but am unsure how to plot a graphs like this :
import numpy as np
import pylab as P
variable_values = trainData.get_vector('variable') #returns one dimensional numpy array of vals
label_values = trainData.get_vector('label')
x = alchemy_category_score_values[alchemy_category_score_values != '?'].astype(float) #removing void vals
y = label_values[alchemy_category_score_values != '?'].astype(float)
fig = plt.figure()
plt.title("Feature breakdown histogram")
plt.xlabel("Variable")
plt.xlim(0, 1)
plt.ylabel("Label")
plt.ylim(0, 1)
xvals = np.linspace(0,1,.02)
plt.show()
The matplotlib tutorial shows the following code to roughly achieve what I want, but I can't really understand how it works (LINK) :
P.figure()
n, bins, patches = P.hist(x, 10, normed=1, histtype='bar', stacked=True)
P.show()
Any help is greatly appreciated. Thank you.
Edit :
I am now getting the error :
AssertionError: incompatible sizes: argument 'height' must be length 5 or scalar
I have printed my two numpy arrays and they are of equal length, one is discrete, the other continuous. Here is the code I am running :
x = variable_values[variable_values != '?'].astype(float)
y = label_values[label_values != '?'].astype(float)
print x #printing numpy arrays of equal size, x is continuous, y is discrete. Both of type float now.
print y
N = 5
ind = np.arange(N) # the x locations for the groups
width = 0.45 # the width of the bars: can also be len(x) sequence
p1 = plt.bar(ind, y, width, color='r') #error occurs here
p2 = plt.bar(ind, x, width, color='y',
bottom=x)
plt.ylabel('Scores')
plt.title('Scores by group and gender')
plt.xticks(ind+width/2., ('G1', 'G2', 'G3', 'G4', 'G5') )
plt.yticks(np.arange(0,81,10))
plt.legend( (p1[0], p2[0]), ('Men', 'Women') )
plt.show()
I think this other tutorial from the same Matplotlib gallery will be much more revealing to you ...
Notice that the second series of data has an extra argument in the call: bottom
p1 = plt.bar(ind, menMeans, width, color='r', yerr=womenStd)
p2 = plt.bar(ind, womenMeans, width, color='y',
bottom=menMeans, yerr=menStd)
Just replace menMeans with x and womenMeans with y.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With