I have data in the following format (here, imported from a CSV file). You can find an example dataset as CSV here.
PAIR PREFERENCE
1 5
1 3
1 2
2 4
2 1
2 3
… and so on. In total, there are 19 pairs, and the PREFERENCE
ranges from 1
to 5
, as discrete values.
What I need is a stacked histogram, e.g. a 100% high column, for each pair, indicating the distribution of the PREFERENCE
values.
Something similar to the "100% stacked columns" in Excel, or (although not quite the same, a so-called "mosaic plot"):
I figured it'd be easiest using ggplot2
, but I don't even know where to start. I know I can create a simple bar chart with something like:
ggplot(d, aes(x=factor(PAIR), y=factor(PREFERENCE))) + geom_bar(position="fill")
… that however doesn't get me very far. So I tried this, and it gets me somewhat closer to what I'm trying to achieve, but it still uses the count of PREFERENCE
, I suppose? Note the ylab
being "count" here, and the values ranging to 19.
qplot(factor(PAIR), data=d, geom="bar", fill=factor(PREFERENCE_FIXED))
Results in:
By the way, this is not really related to this question, and only marginally related to this (i.e. probably same idea, but not continuous values, instead grouped into bars).
A 100% stacked column chart is an Excel chart type meant to show the relative percentage of multiple data series in stacked columns, where the total (cumulative) of stacked columns always equals 100%.
The scale is the same throughout each category's histogram, making them easily comparable. The shape, center, and spread of each histogram can be used for analysis of the data. A stacked histogram is one of two popular ways to display data for a single categorical variable and a single quantitative variable.
Stacked Bar Graphs are used to show how a larger category is divided into smaller categories and what the relationship of each part has on the total amount.
Maybe you want something like this:
ggplot() +
geom_bar(data = dat,
aes(x = factor(PAIR),fill = factor(PREFERENCE)),
position = "fill")
where I've read your data into dat
. This outputs something like this:
The y label is still "count", but you can change that manually by adding:
+ scale_x_discrete("Pairs") + scale_y_continuous("Votes")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With