Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data driven plot names in data.table

This is a personal project to learn the syntax of the data.table package. I am trying to use the data values to create multiple graphs and label each based on the by group value. For example, given the following data:

# Generate dummy data
require(data.table)

set.seed(222)
DT = data.table(grp=rep(c("a","b","c"),each=10), 
 x = rnorm(30, mean=5, sd=1), 
 y = rnorm(30, mean=8, sd=1))

setkey(DT, grp)

The data consists of random x and y values for 3 groups (a, b, and c). I can create a formatted plot of all values with the following code:

# Example of plotting all groups in one plot
require(ggplot2)

p <- ggplot(data=DT, aes(x = x, y = y)) + 
  aes(shape = factor(grp))+
  geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
  labs(title = "Group: ALL")
p

This creates the following plot: Plot of all groups

Instead I would like to create a separate plot for each by group, and change the plot title from “Group: ALL” to “Group: a”, “Group: b”, “Group: c”, etc. The documentation for data.table says:

.BY is a list containing a length 1 vector for each item in by. This can be useful when by is not known in advance. The by variables are also available to j directly by name; useful for example for titles of graphs if j is a plot command, or to branch with if()

That being said, I do not understand how to use .BY or .SD to create separate plots for each group. Your help is appreciated.

like image 259
Stan Avatar asked Jan 15 '14 23:01

Stan


People also ask

What are good titles for graphs?

The title should be a concise description of what is being graphed (e. g., “Pressure as a Function of Temperature for Nitrogen”). Usually you do not need to describe in the title the units used in the graph, but there are some instances where this is necessary.

How do you name data in a graph?

To properly label a graph, you should identify which variable the x-axis and y-axis each represent. Don't forget to include units of measure (called scale) so readers can understand each quantity represented by those axes. Finally, add a title to the graph, usually in the form "y-axis variable vs. x-axis variable."


1 Answers

Here is the data.table solution, though again, not what I would recommend:

make_plot <- function(dat, grp.name) {
  print(
    ggplot(dat, aes(x=x, y=y)) + 
    geom_point() + labs(title=paste0("Group: ", grp.name$grp))
  )
  NULL
}    
DT[, make_plot(.SD, .BY), by=grp]

What you really should do for this particular application is what @dmartin recommends. At least, that's what I would do.

like image 109
BrodieG Avatar answered Oct 12 '22 23:10

BrodieG