I am trying to create a histogram that will show the amount of ratings per value in a discrete star ratings (1-5). There should be a bar per value, and in the x-axis the only numbers to be shown are [1,2,3,4,5] underneath each bar (centered).
I tried setting the amount of bins to 5 or their range to be from 0-7, but that creates bars that span across values (as in the image supplied)
This is the code I have tried (pandas and numpy):
df.stars.hist()
and
hist, bins = np.histogram(x1, bins=5)
ax.bar(bins[:-1], hist.astype(np.float32) / hist.sum(), width=(bins[1]-bins[0]), color="blue")
The histogram is a popular graphing tool. It is used to summarize discrete or continuous data that are measured on an interval scale. It is often used to illustrate the major features of the distribution of the data in a convenient form.
Bar charts should be used when you are showing segments of information. Vertical bar charts are useful to compare different categorical or discrete variables, such as age groups, classes, schools, etc., as long as there are not too many categories to compare.
2.4 Bar Chart. If your data isn't continuous you have other options, and generally discrete numerical data or categorical data (either nominal or ordinal) can be graphed in the same way. With categorical or discrete data a bar chart is typically your best option.
You can use the plot(kind='bar')
method:
stars = Series(randint(1, 6, size=100))
vc = stars.value_counts().sort_index()
ax = vc.plot(kind='bar')
fig = ax.get_figure()
fig.autofmt_xdate()
to get:
EDIT #1: To show them as proportions just divide by the sum
vc /= float(vc.sum())
assert vc.sum() == 1
to get:
EDIT #2: To show them as percentages divide by the sum as above and use the format spec mini-language to format the y-axis tick labels
new_labels = ['{0:.0%}'.format(float(x.get_text())) for x in ax.get_yticklabels()]
ax.set_yticklabels(new_labels)
to get:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With