Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: handling and plotting grouped data

Tags:

r

ggplot2

This is a follow up question on this one: R: plot multiple lines in one graph

In there I used part of my data to graph 1 graph with multiple lines. Now I want to graph multiple graphs in one grid, since I have grouped data. Right now I'm doing this with creating dataframes for each group of data and then creating a graph for each dataframe and combine those using gridd.arrange() However, I'm wondering if I could handle the grouped data as 1 dataset instead of creating all those separate tables?

The data I have is structured like this:

          Category1    Category2    Category3
Company   2011   2013  2011   2013  2011   2013
Company1  300    350   290    300   295    290
Company2  320    430   305    301   300    400
Company3  310    420   400    305   400    410

So is there any way to proces this at once and plot the 3 graphs (for each Category) with lines for each Company for the Year (2011 and 2013)?

like image 749
Chrisvdberge Avatar asked Jun 17 '13 15:06

Chrisvdberge


1 Answers

You should definitely learn how to structure your data and how to make a reproducable example. It's really hard to deal with data in such an unstructured format. Not only for you, but also for us.

mdf <- read.table( text="Company   2011   2013  2011   2013  2011   2013
Company1  300    350   290    300   295    290
Company2  320    430   305    301   300    400
Company3  310    420   400    305   400    410", header = TRUE, check.names=FALSE )

library("reshape2")
cat1 <- melt(mdf[c(1,2,3)], id.vars="Company", value.name="value", variable.name="Year")
cat1$Category <- "Category1"
cat2 <- melt(mdf[c(1,4,5)], id.vars="Company", value.name="value", variable.name="Year")
cat2$Category <- "Category2"
cat3 <- melt(mdf[c(1,6,7)], id.vars="Company", value.name="value", variable.name="Year")
cat3$Category <- "Category3"
mdf <- rbind(cat1, cat2, cat3)

head(mdf)
   Company Year value  Category
1 Company1 2011   300 Category1
2 Company2 2011   320 Category1
3 Company3 2011   310 Category1
4 Company1 2013   350 Category1
5 Company2 2013   430 Category1
6 Company3 2013   420 Category1

This can be automated of course, if the number of categories is very large:

library( "plyr" )
mdf <- adply( c(1:3), 1, function( cat ){
  tmp <- melt(mdf[ c(1, cat*2, cat*2+1) ], id.vars="Company", value.name="value", variable.name="Year")
  tmp$Category <- paste0("Category", cat)
  return(tmp)
} )

But if you can avoid pushing all this data forth and back from the beginning, you should do so.

Using facets

ggplot2 has a builtin support for faceted plots displaying data of the same type, if they can be subset by one (or multiple) variables. See ? facet_wrap or ? facet_grid.

ggplot(data=mdf, aes(x=Year, y=value, group = Company, colour = Company)) +
    geom_line() +
    geom_point( size=4, shape=21, fill="white") +
    facet_wrap( "Category" )

enter image description here

Getting individual plots

Alternatively you can subset your data.frame by the according variable and store the individual plots in an list:

librayr("plyr")
ll <- dlply( mdf, "Category", function(x){
        ggplot(data=x, aes(x=Year, y=value, group = Company, colour = Company)) +
          geom_line() +
          geom_point( size=4, shape=21, fill="white")
})
ll[["Category1"]]
like image 134
Beasterfield Avatar answered Sep 22 '22 05:09

Beasterfield