Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

arrange multiple graphs using a for loop in ggplot2

I want to produce a pdf which shows multiple graphs, one for each NetworkTrackingPixelId. I have a data frame similar to this:

> head(data)
  NetworkTrackingPixelId                           Name       Date Impressions
1                   2421                    Rubicon RTB 2014-02-16      168801
2                   2615                     Google RTB 2014-02-16     1215235
3                   3366                      OpenX RTB 2014-02-16      104419
4                   3606                   AppNexus RTB 2014-02-16      170757
5                   3947                   Pubmatic RTB 2014-02-16       68690
6                   4299            Improve Digital RTB 2014-02-16         701

I was thinking to use a script similar to the one below:

# create a vector which stores the NetworkTrackingPixelIds
tp <- data %.%
        group_by(NetworkTrackingPixelId) %.%
        select(NetworkTrackingPixelId)

# create a for loop to print the line graphs
for (i in tp) {
      print(ggplot(data[which(data$NetworkTrackingPixelId == i), ], aes(x = Date, y = Impressions)) + geom_point() + geom_line())
    }

I was expecting this command to produce many graphs, one for each NetworkTrackingPixelId. Instead the result is an unique graph which aggregate all the NetworkTrackingPixelIds.

Another thing I've noticed is that the variable tp is not a real vector.

> is.vector(tp)
[1] FALSE

Even if I force it..

tp <- as.vector(data %.%
        group_by(NetworkTrackingPixelId) %.%
        select(NetworkTrackingPixelId))
> is.vector(tp)
[1] FALSE
> str(tp)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1397 obs. of  1 variable:
 $ NetworkTrackingPixelId: int  2421 2615 3366 3606 3947 4299 4429 4786 6046 6286 ...
 - attr(*, "vars")=List of 1
  ..$ : symbol NetworkTrackingPixelId
 - attr(*, "drop")= logi TRUE
 - attr(*, "indices")=List of 63
  ..$ : int  24 69 116 162 205 253 302 351 402 454 ...
  ..$ : int  1 48 94 140 184 232 281 330 380 432 ...

[I've cut a bit this output]

 - attr(*, "group_sizes")= int  29 29 2 16 29 1 29 29 29 29 ...
 - attr(*, "biggest_group_size")= int 29
 - attr(*, "labels")='data.frame':  63 obs. of  1 variable:
  ..$ NetworkTrackingPixelId: int  8799 2615 8854 8869 4786 7007 3947 9109 9126 9137 ...
  ..- attr(*, "vars")=List of 1
  .. ..$ : symbol NetworkTrackingPixelId
like image 1000
Gianluca Avatar asked Mar 17 '14 20:03

Gianluca


2 Answers

Since I don't have your dataset, I will use the mtcars dataset to illustrate how to do this using dplyr and data.table. Both packages are the finest examples of the split-apply-combine paradigm in rstats. Let me explain:

Step 1 Split data by gear

  • dplyr uses the function group_by
  • data.table uses argument by

Step 2: Apply a function

  • dplyr uses do to which you can pass a function that uses the pieces x.
  • data.table interprets the variables to the function in context of each piece.

Step 3: Combine

There is no combine step here, since we are saving the charts created to file.

library(dplyr)
mtcars %.%
  group_by(gear) %.%
  do(function(x){ggsave(
    filename = sprintf("gear_%s.pdf", unique(x$gear)), qplot(wt, mpg, data = x)
  )})

library(data.table)
mtcars_dt = data.table(mtcars)
mtcars_dt[,ggsave(
  filename = sprintf("gear_%s.pdf", unique(gear)), qplot(wt, mpg)),
  by = gear
]

UPDATE: To save all files into one pdf, here is a quick solution.

plots = mtcars %.%
  group_by(gear) %.%
  do(function(x) {
    qplot(wt, mpg, data = x)
  })

pdf('all.pdf')
invisible(lapply(plots, print))
dev.off()
like image 64
Ramnath Avatar answered Nov 04 '22 17:11

Ramnath


I recently had a project that required producing a lot of individual pngs for each record. I found I got a huge speed up doing some pretty simple parallelization. I am not sure if this is more performant than the dplyr or data.table technique but it may be worth trying. I saw a huge speed bump:

require(foreach)
require(doParallel)
workers <- makeCluster(4)
registerDoParallel(workers) 
foreach(i = seq(1, length(mtcars$gear)), .packages=c('ggplot2')) %dopar% {
  j <- qplot(wt, mpg, data = mtcars[i,])
  png(file=paste(getwd(), '/images/',mtcars[i, c('gear')],'.png', sep=''))
  print(j)
  dev.off()
}
like image 25
JBecker Avatar answered Nov 04 '22 15:11

JBecker