Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I create a (100%) stacked histogram in R?

My dataset:

I have data in the following format (here, imported from a CSV file). You can find an example dataset as CSV here.

PAIR   PREFERENCE
1      5
1      3
1      2
2      4
2      1
2      3

… and so on. In total, there are 19 pairs, and the PREFERENCE ranges from 1 to 5, as discrete values.


What I'm trying to achieve:

What I need is a stacked histogram, e.g. a 100% high column, for each pair, indicating the distribution of the PREFERENCE values.

Something similar to the "100% stacked columns" in Excel, or (although not quite the same, a so-called "mosaic plot"):


What I tried:

I figured it'd be easiest using ggplot2, but I don't even know where to start. I know I can create a simple bar chart with something like:

ggplot(d, aes(x=factor(PAIR), y=factor(PREFERENCE))) + geom_bar(position="fill")

… that however doesn't get me very far. So I tried this, and it gets me somewhat closer to what I'm trying to achieve, but it still uses the count of PREFERENCE, I suppose? Note the ylab being "count" here, and the values ranging to 19.

qplot(factor(PAIR), data=d, geom="bar", fill=factor(PREFERENCE_FIXED))

Results in:

enter image description here

  • So, what do I have to do to get the stacked bars to represent a histogram?
  • Or do they actually do this already?
  • If so, what do I have to change to get the labels right (e.g. have percentages instead of the "count")?

By the way, this is not really related to this question, and only marginally related to this (i.e. probably same idea, but not continuous values, instead grouped into bars).

like image 266
slhck Avatar asked Jan 06 '12 12:01

slhck


People also ask

What is a 100% stacked chart?

A 100% stacked column chart is an Excel chart type meant to show the relative percentage of multiple data series in stacked columns, where the total (cumulative) of stacked columns always equals 100%.

Can a histogram be stacked?

The scale is the same throughout each category's histogram, making them easily comparable. The shape, center, and spread of each histogram can be used for analysis of the data. A stacked histogram is one of two popular ways to display data for a single categorical variable and a single quantitative variable.

What does a stacked histogram show?

Stacked Bar Graphs are used to show how a larger category is divided into smaller categories and what the relationship of each part has on the total amount.


1 Answers

Maybe you want something like this:

ggplot() + 
    geom_bar(data = dat,
             aes(x = factor(PAIR),fill = factor(PREFERENCE)),
             position = "fill")

where I've read your data into dat. This outputs something like this:

enter image description here

The y label is still "count", but you can change that manually by adding:

+ scale_x_discrete("Pairs") + scale_y_continuous("Votes")
like image 103
joran Avatar answered Oct 30 '22 21:10

joran