Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R stacked barchart with aggregate data

I'm having troubles creating a stacked barchart with aggregate data. When dealing with aggregate tables from other people's reports I generally use Excel, but I'd like to start doing all my charts in R, possibly with lattice or ggplot. In Excel doing a stacked barchart of the following aggregate data takes a couple of clicks (Insert, Column Charts, Stacked Column), and you get something like this.enter image description here.

Besides wanting to this chart in R I also want to use ggplot's faceting,i.e. put two stacked barcharts side by side in ggplot to compare two groups (A and B).I've played around with other charts and this seems the best choice. This is the data.The Excel chart only shows group A (the numbers are percentages).

D<-as.data.frame(structure(list(Group = c("A", "A", "A", "A", "A", 
"A", "B", "B", "B", "B", "B", "B"
), Education = c("NVQ Level 4 and above", "NVQ Level3", "NVQ Level 2", 
"Below NVQ Level 2", "Other qualification", "No qualification", 
"NVQ Level 4 and above", "NVQ Level3", "NVQ Level 2", "Below NVQ Level 2", 
"Other qualification", "No qualification"), Full.Time = c(47, 
27, 23, 17, 18, 9, 36, 26, 22, 22, 27, 12), PT.16.hours = c(20, 
24, 22, 18, 18, 12, 22, 21, 21, 22, 14, 10), PT.16.hours.1 = c(12, 
11, 10, 11, 13, 5, 24, 25, 25, 20, 16, 12)), .Names = c("Group", 
"Education", "Full.Time", "PT>16.hours", "PT<16.hours")))

Before getting to the faceting to show the difference in the two groups, I'm actually having troubles creating a single stacked bar chart (like the one above) with ggplot2. I'm guessing I shouldn't have 3 variables (FullTime,PT,PT>16 hours), but rather single rows for each case, so instead of having

A    NVQ Level 4 and above      47  20  12
A    NVQ Level3                 27  24  11

I should have

Group          Education    Work     Percentage
A   NVQ Level 4 and above   Full Time   47
A   NVQ Level 4 and above   PT>16 hours 20

If this is the only way to get ggplot to do the chart, how would you change from one format to the other with a few lines of code?I often find this type of data so it would be good to have a standardised procedure. I've also played around with the ggplot option 'identity' but haven't had much success.

Any help would be much appreciated.

Thanks

like image 433
Marco M Avatar asked Jul 28 '12 19:07

Marco M


2 Answers

reshape your data:

library(reshape2)
df <- melt(D)

And simply plot it :)

ggplot(df, aes(x = factor(Education), y = value, fill = factor(variable))) +
geom_bar() + facet_grid(.~Group) +
ylab('') + xlab('') + opts(title = '') + scale_fill_discrete('') +
theme_bw() +
opts(axis.text.x=theme_text(angle = 45, hjust = 1, vjust = 1))

Where the first line creates sets aesthetics, second line adds bar layer and the facet, on the 3rd line we remove unwanted texts from the plot, the 4th line sets the b&w theme and on the last line we rotate the x asis labels.

enter image description here

like image 75
daroczig Avatar answered Oct 29 '22 15:10

daroczig


The trick is to use melt from the plyr packate to melt down the three measured columns into one (a new column named value), along with an identifying column (named variable) to group on:

require(ggplot2)
require(reshape)

# first we need to get Full.Time, PT.16, etc. into one column
df <- melt(D, .measure.vars=.(Full.Time, PT.16.hours, PT.16.hours.1))
ggplot(df, aes(x=Education, y=value, fill=variable )) +
  geom_bar(stat="identity")

The rest is just reordering factors so output matches what you want.

Take a look at df to see what melt ends up doing, since its a common workflow for ggplot2.

plot

To move on to a facetted plot using the Group factor just requires adding the appropriate facet_wrap:

ggplot(df, aes(x=Education, y=value, fill=variable )) +
  geom_bar(stat="identity") +
  facet_wrap(~ Group)

facetted plot

like image 3
mindless.panda Avatar answered Oct 29 '22 14:10

mindless.panda