I'm struggling get the right ordering of variables in a graph I made with ggplot2 in R.
Suppose I have a dataframe such as:
set.seed(1234)
my_df<- data.frame(matrix(0,8,4))
names(my_df) <- c("year", "variable", "value", "vartype")
my_df$year <- rep(2006:2007)
my_df$variable <- c(rep("VX",2),rep("VB",2),rep("VZ",2),rep("VD",2))
my_df$value <- runif(8, 5,10)
my_df$vartype<- c(rep("TA",4), rep("TB",4))
which yields the following table:
year variable value vartype
1 2006 VX 5.568517 TA
2 2007 VX 8.111497 TA
3 2006 VB 8.046374 TA
4 2007 VB 8.116897 TA
5 2006 VZ 9.304577 TB
6 2007 VZ 8.201553 TB
7 2006 VD 5.047479 TB
8 2007 VD 6.162753 TB
There are four variables (VX, VB, VZ and VD), belonging to two groups of variable types, (TA and TB).
I would like to plot the values as horizontal bars on the y axis, ordered vertically first by variable groups and then by variable names, faceted by year, with values on the x axis and fill colour corresponding to variable group. (i.e. in this simplified example, the order should be, top to bottom, VB, VX, VD, VZ)
1) My first attempt has been to try the following:
ggplot(my_df,
aes(x=variable, y=value, fill=vartype, order=vartype)) +
# adding or removing the aesthetic "order=vartype" doesn't change anything
geom_bar() +
facet_grid(. ~ year) +
coord_flip()
However, the variables are listed in reverse alphabetical order, but not by vartype : the order=vartype
aesthetic is ignored.
2) Following an answer to a similar question I posted yesterday, i tried the following, based on the post Order Bars in ggplot2 bar graph :
my_df$variable <- factor(
my_df$variable,
levels=rev(sort(unique(my_df$variable))),
ordered=TRUE
)
This approach does gets the variables in vertical alphabetical order in the plot, but ignores the fact that the variables should be ordered first by variable goups (with TA-variables on top and TB-variables below).
3) The following gives the same as 2 (above):
my_df$vartype <- factor(
my_df$vartype,
levels=sort(unique(my_df$vartype)),
ordered=TRUE
)
... which has the same issues as the first approach (variables listed in reverse alphabetical order, groups ignored)
4) another approach, based on the original answer to Order Bars in ggplot2 bar graph , also gives the same plat as 2, above
my_df <- within(my_df,
vartype <- factor(vartype,
levels=names(sort(table(vartype),
decreasing=TRUE)))
)
I'm puzzled by the fact that, despite several approaches, the aesthetic order=vartype
is ignored. Still, it seems to work in an unrelated problem: http://learnr.wordpress.com/2010/03/23/ggplot2-changing-the-default-order-of-legend-labels-and-stacking-of-data/
I hope that the problem is clear and welcome any suggestions.
Matteo
I posted a similar question yesterday, but, unfortunately I made several mistakes when descrbing the problem and providing a reproducible example. I've listened to several suggestions since, and thoroughly searched stakoverflow for similar question and applied, to the best of my knowledge, every suggested combination of solutions, to no avail. I'm posting the question again hoping to be able to solve my issue and, hopefully, be helpful to others.
This has little to do with ggplot, but is instead a question about generating an ordering of variables to use to reorder the levels of a factor. Here is your data, implemented using the various functions to better effect:
set.seed(1234)
df2 <- data.frame(year = rep(2006:2007),
variable = rep(c("VX","VB","VZ","VD"), each = 2),
value = runif(8, 5,10),
vartype = rep(c("TA","TB"), each = 4))
Note that this way variable
and vartype
are factors. If they aren't factors, ggplot()
will coerce them and then you get left with alphabetical ordering. I have said this before and will no doubt say it again; get your data into the correct format first before you start plotting / doing data analysis.
You want the following ordering:
> with(df2, order(vartype, variable))
[1] 3 4 1 2 7 8 5 6
where you should note that we get the ordering by vartype
first and only then by variable
within the levels of vartype
. If we use this to reorder the levels of variable
we get:
> with(df2, reorder(variable, order(vartype, variable)))
[1] VX VX VB VB VZ VZ VD VD
attr(,"scores")
VB VD VX VZ
1.5 5.5 3.5 7.5
Levels: VB VX VD VZ
(ignore the attr(,"scores")
bit and focus on the Levels). This has the right ordering, but ggplot()
will draw them bottom to top and you wanted top to bottom. I'm not sufficiently familiar with ggplot()
to know if this can be controlled, so we will also need to reverse the ordering using decreasing = TRUE
in the call to order()
.
Putting this all together we have:
## reorder `variable` on `variable` within `vartype`
df3 <- transform(df2, variable = reorder(variable, order(vartype, variable,
decreasing = TRUE)))
Which when used with your plotting code:
ggplot(df3, aes(x=variable, y=value, fill=vartype)) +
geom_bar() +
facet_grid(. ~ year) +
coord_flip()
produces this:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With