Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple ggplot2 graphs with shared Data

How do I make multiple plots of the same data but colored differently by different factors (columns) while recycling data? Is this what gridExtra does differently than cowplot?

Objective: My objective is to visually compare different results of clustering the same data efficiently. I currently believe the easiest way to compare 2-4 clustering algorithms visually is to have them plotted next to each other.

Thus, how do I plot the same data side by side colored differently?

Challenge/Specifications: Performance is very important. I have roughly 30,000 graphs to make, each with 450 - 480 points. It is critical that the data is "recycled."

I am able to plot them side by side using packages cowplot and gridExtra. I just started using gridExtra today but it seems to recycle data and is better than cowplot for my purposes. Update: u/eipi10 demonstrated facet_wrap could work if I gathered the columns before plotting.

Set up

    #Packages
     library(ggplot2)
     library(cowplot)
     library(gridExtra)
     library(pryr) #memory profile

    #Data creation
      x.points  <- c(1, 1, 1, 3, 3, 3, 5, 5, 5)
      y.points  <- c(1, 3, 5, 1, 3, 5, 1, 3, 5)
      cl_vert   <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
      cl_hoz    <- c("A", "B", "C", "A", "B", "C", "A", "B", "C")
      cl_cent   <- c("A","A","A","A", "B", "A","A","A","A")
    df <- data.frame(x.points, y.points, cl_vert, cl_hoz, cl_cent)

Graphing them

    #Graph function and individual plots
     graph <- function(data = df, Title = "", color.by, legend.position = "none"){
       ggplot(data, aes(x = `x.points`, y = `y.points`)) +
         geom_point(aes(color = as.factor(color.by))) + scale_color_brewer(palette = "Set1") + 
         labs(subtitle = Title, x = "log(X)", y = "log(Y)", color = "Color" ) + 
         theme_bw() + theme(legend.position = legend.position)  
     }

     g1 <- graph(Title = "Vertical", color.by = cl_vert)
     g2 <- graph(Title = "Horizontal", color.by = cl_hoz)
     g3 <- graph(Title = "Center", color.by = cl_cent)

    #Cowplot
     legend <- get_legend(graph(color.by = cl_vert, legend.position = "right")) #Not a memory waste
     plot <- plot_grid(g1, g2, g3, labels = c("A", "B", "C"))
     title <- ggdraw() + draw_label(paste0("Data Ex ", "1"), fontface = 'bold') 
     plot2 <- plot_grid(title, plot, ncol=1, rel_heights=c(0.1, 1)) # rel_heights values control title margins
     plot3 <- plot_grid(plot2, legend, rel_widths = c(1, 0.3))
     plot3

    #gridExtra
     plot_grid.ex <- grid.arrange(g1, g2, g3, ncol = 2, top = paste0("Data Ex ", "1"))
     plot_grid.ex

Memory usage with pryr

    #Comparison
     object_size(plot_grid) #315 kB 
     object_size(plot3) #1.45 MB
    #Individual objects
     object_size(g1) #756 kB
     object_size(g2) #756 kB
     object_size(g3) #756 kB
     object_size(g1, g2, g3) #888 kB
     object_size(legend) #43.6 kB

Additional Questions: After writing this question and providing sample data, I just remembered gridExtra, tried it, and it seems to take up less memory than the combined data of its component graphs. I thought g1, g2, and g3 shared the same data except for the coloring assignment, which was why there was roughly 130 kB difference between the individual components and the total object size. How is it that plot_grid takes up even less space than that? ls.str(plot_grid) doesn't seem to show any consolidation of g1, g2, and g3. Would my best bet be to use lineprof() and run line by line comparisons?

Sources I've skimmed/read/consulted:

  • http://adv-r.had.co.nz/memory.html #don't fully understand
  • Add a common Legend for combined ggplots #to fix gridExtra later

Please bear with me as I am a new programmer (just truly started scripting December); I don't understand all the technical details yet but I want to.

like image 612
A Duv Avatar asked Feb 23 '26 15:02

A Duv


1 Answers

Faceting will work here if you convert your data to long format. Here's an example:

library(tidyverse)

df %>% gather(method, cluster, cl_vert:cl_cent) %>% 
  ggplot(aes(x = x.points, y = y.points)) + 
    geom_point(aes(color = cluster)) + 
    scale_color_brewer(palette = "Set1") + 
    theme_bw() +
    facet_wrap(~ method)

enter image description here

like image 178
eipi10 Avatar answered Feb 25 '26 07:02

eipi10



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!