Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arrangement of large number of plots and connect with lines in r

Tags:

plot

r

r-grid

I have a large number of small plots need to placed in a bigger plot canvase and arrange small plots into and connect them with lines. A small example will look like this:

A to L are independent plots. The cordinate of their placement is given.

enter image description here

plot grid coordinates: PlotgridX and plotgridY can decide when the small plot need to be centered

    plotcord <- data.frame (
plotname = c("A", "B", "C", "D",    "E",    "F",   "G", "H", "I", "J", "K", "L"),
plotgridX = c( 1.5, 2,   5,   5.5,   1.75,  5.25,  8   , 1 ,  2,   3.5,  6,  7.5),
 plotgridY = c( 3,  3,    3,    3,     2 ,    2,    2,   2  , 1,   1,   1,   1))


   plotname plotgridX plotgridY
1         A      1.50         3
2         B      2.00         3
3         C      5.00         3
4         D      5.50         3
5         E      1.75         2
6         F      5.25         2
7         G      8.00         2
8         H      1.00         2
9         I      2.00         1
10        J      3.50         1
11        K      6.00         1
12        L      7.50         1

The connecting lines is decided by the following data frame:

connectd <- data.frame (id = c(  "E",    "F", "I", "J", "K", "L"),
                        parent1 = c("A",  "C", "H", "E" ,"E", "F"),
                      parent2 = c("B",  "D",  "E", "F", "F", "G"))
connectd
  id parent1 parent2
1  E       A       B
2  F       C       D
3  I       H       E
4  J       E       F
5  K       E       F
6  L       F       G

For example, here figure E should be connected to its parent1 "A" and parent 2 "B" figures at the same time "A", "B" should be connected to make it "T shaped" connection. Similarly for the other ids.

Although I have other details to plot in each subplot, just as proof of concept I could like to plot one rectangles withing each plots with names n1 and n2, to make a plot like the following: enter image description here

like image 350
jon Avatar asked Feb 23 '13 17:02

jon


1 Answers

I'm writing this answer up, partly for posterity, and partly because I have been meaning to write up some functions like this for some others who have been trying to get into custom visualizations in R.

Background

In R, many people rightly leave behind the base plotting functions and begin to turn to the more flexible wrapper packages, 'lattice' and 'ggplot2'. These are powerful tools for rapidly exploring your data by applying layers of logic on top of a single plot. The packages then process all of the layers and produce one window of plots, arranged appropriately. These packages are wonderful, and I recommend every R user learn at least one of them.

One caveat, though, is that the 'lattice' and 'ggplot2' packages are really more for data exploration than intelligent data visualization. When creating a custom data visualization, these packages make too many decisions for you, because that's what a wrapper is for: taking some decisions out of your hands.

Custom visualization? Enter 'grid'

The base 'grid' package is the ultimate in drawing flexibility, partly because it extends the functionality of the base plotting functions, rather than wrapping them. With 'grid' functions, we gain the ability to create visual objects using a variety of different units for placement and sizing, and (this is really important) we gain the ability to use justifications for our objects' anchors. Paul Murrell's book, "R Graphics," is an excellent resource if you're wanting to learn. A copy of it sits out on my desk.

If you've ever used a vector graphics drawing program (like Illustrator or Inkscape), you probably already know what I'm talking about when I mention justifications. This is the ability to line items up by referencing the locations of other items. I'd talk about this more, but I could talk about it all day. Let's move on to the process.

The process

Now, I should preface this by saying it took me about two hours to write the function library, and about 5 minutes to write the demo code. I will be using the function library in the future as a training tool, and anyone can feel free to use/modify it.

The 'grid' process works in three basic steps:

  1. Make a viewport
  2. Draw some objects
  3. Pop your viewport

In making a viewport, we use 'pushViewport' to push the 'viewport' object, something like so:

pushViewport(viewport(x=0, y=1, xscale=c(1, 10), yscale=c(0, 100), width=0.25, height=0.25, default.units="npc", just=c("left","bottom"), clip="off"))

The basic viewport has an "npc" set of units where x goes from 0 to 1, left to right, and y goes 0 to 1, bottom to top. This means the origin is in the lower left-hand corner. The above viewport is created as one quarter of the plot in the lower left-hand corner. When we specify an "xscale" and "yscale", however, we gain the ability to reference the units "native" when drawing objects. This means we can use "native" units for drawing data and use "npc" units when drawing things like axes and labels.

When drawing objects, we use functions like 'grid.lines', 'grid.polygon', 'grid.points', 'grid.circle', and so on. Every visualization I have ever made has used these objects. When you draw the data by specifying these objects by hand, you gain an enormous amount of control. Filling in a line chart is one of the most obvious examples of added capability. A filled in area is just a polygon with points of the polygon specified by data and with two anchor points added. I use this to highlight areas of a line chart or to make it easier to read multiple lines on the same chart.

You can also get creative, for instance, creating bars that aren't rectangles, or combining multiple plots in a more sophisticated way. I and some others recently ran a sci-fi themed walking game, and we used a custom chart (made with 'grid') to display our final performance. The chart combines the number of days on the "survivor" team as a time axis, displays player vs. enemy steps per day as a bar chart, and displays cumulative player and enemy steps per day as a filled line chart. I would have been hard-pressed to create a comparable visual using the 'lattice' or 'ggplot2' packages.

Here is a sample of one of the charts (sans the real-life player name), to give an idea of just how flexible 'grid' visuals can be:

Sample sophisticated chart

A proof of concept for the question

Now to specifically address the question posed by the OP. In the question, the OP implies that s/he will be plotting charts inside each area. This can get tricky when using pre-built plotting packages, because most plotting functions will overwrite any plot specifications you have already set up. Instead, it is more reliable to use something like the base 'grid' functions to specify plotting areas and then draw the necessary data objects within the viewports.

To keep from working too hard, I first wrote a custom function library that sets my various chart parameters and draws each type of chart for me. I don't like debugging code, so functions are the way I work through things in pieces. Each time I get a piece of code right, I throw it into a function for later use.

The code may look a little complicated, but remember the three 'grid' steps: push viewport, draw, pop viewport. This is what each function is doing. To demo the work, I made four different drawing functions: filled line chart, scatter plot, histogram, and a box drawing as suggested by the OP. Each function is flexible enough to accommodate multiple sets of data values in each chart, setting the alpha values to compensate and allowing us to see the values plotted over top of each other.

In a case like this, you only make your functions as flexible as you need to, so I did take one shortcut on the lines and drew them from a little bit of code in the demo that made a LOT of assumptions. I still drew it with logic-driven code, though, to demo how to draw more complex objects with simple logic.

Here is the result of the demo code, using some built-in R datasets for easy data (EuStockMarkets, nottem, sunspots.month):

Chart demo of grid functionality


Custom function library:


library(grid)

# Specify general chart options.
chart_Fill = "lemonchiffon"
chart_Col = "snow3"
space_Background = "white"
title_CEX = 0.8
axis_CEX = 0.6
chart_Width <- 3/3
chart_Height <- 2/5

# Function to initialize a plotting area.
init_Plot <- function(
    .df,
    .x_Loc, 
    .y_Loc, 
    .justify, 
    .width, 
    .height
    ){

    # Initialize plotting area to fit data.
    # We have to turn off clipping to make it
    # easy to plot the labels around the plot.
    pushViewport(viewport(xscale=c(min(.df[,1]), max(.df[,1])), yscale=c(min(0,min(.df[,-1])), max(.df[,-1])), x=.x_Loc, y=.y_Loc, width=.width, height=.height, just=.justify, clip="off", default.units="native"))

    # Color behind text.
    grid.rect(x=0, y=0, width=unit(axis_CEX, "lines"), height=1, default.units="npc", just=c("right", "bottom"), gp=gpar(fill=space_Background, col=space_Background))
    grid.rect(x=0, y=1, width=1, height=unit(title_CEX, "lines"), default.units="npc", just=c("left", "bottom"), gp=gpar(fill=space_Background, col=space_Background))

    # Color in the space.
    grid.rect(gp=gpar(fill=chart_Fill, col=chart_Col))
}

# Function to finalize and label a plotting area.
finalize_Plot <- function(
    .df, 
    .plot_Title
    ){

    # Label plot using the internal reference
    # system, instead of the parent window, so
    # we always have perfect placement.
    grid.text(.plot_Title, x=0.5, y=1.05, just=c("center","bottom"), rot=0, default.units="npc", gp=gpar(cex=title_CEX))
    grid.text(paste(names(.df)[-1], collapse=" & "), x=-0.05, y=0.5, just=c("center","bottom"), rot=90, default.units="npc", gp=gpar(cex=axis_CEX))
    grid.text(names(.df)[1], x=0.5, y=-0.05, just=c("center","top"), rot=0, default.units="npc", gp=gpar(cex=axis_CEX))

    # Finalize plotting area.
    popViewport()
}

# Function to plot a filled line chart of
# the data in a data frame.  The first column
# of the data frame is assumed to be the
# plotting index, with each column being a
# set of y-data to plot.  All data is assumed
# to be numeric.
plot_Line_Chart <- function(
    .df,
    .x_Loc,
    .y_Loc,
    .justify,
    .width,
    .height,
    .colors,
    .plot_Title
    ){

    # Initialize plot.
    init_Plot(.df, .x_Loc, .y_Loc, .justify, .width, .height)

    # Calculate what value to use as the
    # return for the polygons.
    y_Axis_Min <- min(0, min(.df[,-1]))

    # Plot each set of data as a polygon,
    # so we can fill it in with color to
    # make it easier to read.
    for (i in 2:ncol(.df)){
        grid.polygon(x=c(min(.df[,1]),.df[,1], max(.df[,1])), y=c(y_Axis_Min,.df[,i], y_Axis_Min), default.units="native", gp=gpar(fill=.colors[i-1], col=.colors[i-1], alpha=1/ncol(.df)))
    }

    # Draw plot axes.
    grid.lines(x=0, y=c(0,1), default.units="npc")
    grid.lines(x=c(0,1), y=0, default.units="npc")

    # Finalize plot.
    finalize_Plot(.df, .plot_Title)

}

# Function to plot a scatterplot of
# the data in a data frame.  The
# assumptions are the same as 'plot_Line_Chart'.
plot_Scatterplot <- function(
    .df,
    .x_Loc,
    .y_Loc,
    .justify,
    .width,
    .height,
    .colors,
    .plot_Title
    ){

    # Initialize plot.
    init_Plot(.df, .x_Loc, .y_Loc, .justify, .width, .height)

    # Plot each set of data as colored points.
    for (i in 2:ncol(.df)){
        grid.points(x=.df[,1], y=.df[,i], pch=19, size=unit(1, "native"), default.units="native", gp=gpar(col=.colors[i-1], alpha=1/ncol(.df)))
    }

    # Draw plot axes.
    grid.lines(x=0, y=c(0,1), default.units="npc")
    grid.lines(x=c(0,1), y=0, default.units="npc")

    # Finalize plot.
    finalize_Plot(.df, .plot_Title)

}

# Function to plot a histogram of
# all the columns in a data frame,
# except the first, which is assumed to
# be an index.
plot_Histogram <- function(
    .df,
    .x_Loc,
    .y_Loc,
    .justify,
    .width,
    .height,
    .colors,
    .plot_Title,
    ...
    ){

    # Create a list containing the histogram
    # data for each data column and calculate
    # data ranges.  Any extra parameters
    # specified will pass to the 'hist' function.
    hist_Data <- list()
    hist_Count_Range <- c(0,NA)
    hist_Breaks_Range <- c(NA,NA)
    for (i in 2:ncol(.df)){
        hist_Data[[i]] <- hist(.df[,i], plot=FALSE, ...)
        hist_Count_Range[2] <- max(max(hist_Data[[i]]$counts), hist_Count_Range[2], na.rm=TRUE)
        hist_Breaks_Range <- c(min(min(hist_Data[[i]]$breaks), hist_Breaks_Range[1], na.rm=TRUE), max(max(hist_Data[[i]]$breaks), hist_Breaks_Range[2], na.rm=TRUE))
    }


    # Initialize plotting area to fit data.
    # We are doing this in a custom way to
    # allow more flexibility than built into
    # the 'init_Plot' function.
    # We have to turn off clipping to make it
    # easy to plot the labels around the plot.
    pushViewport(viewport(xscale=hist_Breaks_Range, yscale=hist_Count_Range, x=.x_Loc, y=.y_Loc, width=.width, height=.height, just=.justify, clip="off", default.units="native"))

    # Color behind text.
    grid.rect(x=0, y=0, width=unit(axis_CEX, "lines"), height=1, default.units="npc", just=c("right", "bottom"), gp=gpar(fill=space_Background, col=space_Background))
    grid.rect(x=0, y=1, width=1, height=unit(title_CEX, "lines"), default.units="npc", just=c("left", "bottom"), gp=gpar(fill=space_Background, col=space_Background))

    # Color in the space.
    grid.rect(gp=gpar(fill=chart_Fill, col=chart_Col))


    # Draw x axis.
    grid.lines(x=c(0,1), y=0, default.units="npc")

    # Plot each set of data as a histogram.
    for (i in 2:ncol(.df)){
        grid.rect(x=hist_Data[[i]]$mids, y=0, width=diff(hist_Data[[i]]$mids[1:2]), height=hist_Data[[i]]$counts, default.units="native", just=c("center","bottom"), gp=gpar(fill=.colors[i-1], col=.colors[i-1], alpha=1/ncol(.df)))
    }

    # Label plot using the internal reference
    # system, instead of the parent window, so
    # we always have perfect placement.
    grid.text(.plot_Title, x=0.5, y=1.05, just=c("center","bottom"), rot=0, default.units="npc", gp=gpar(cex=title_CEX))
    grid.text(paste(names(.df)[-1], collapse=" & "), x=-0.05, y=0.5, just=c("center","bottom"), rot=90, default.units="npc", gp=gpar(cex=axis_CEX))

    # Finalize plotting area.
    popViewport()
}

draw_Sample_Box <- function(
    .x_Loc,
    .y_Loc,
    .x_Scale,
    .y_Scale,
    .justify,
    .width,
    .height,
    .colors,
    .box_X,
    .box_Y,
    .plot_Title
    ){

    pushViewport(viewport(xscale=.x_Scale, yscale=.y_Scale, x=.x_Loc, y=.y_Loc, width=chart_Width, height=chart_Height, just=.justify, clip="off", default.units="native"))

    # Color behind text.
    grid.rect(x=0, y=1, width=1, height=unit(title_CEX, "lines"), default.units="npc", just=c("left", "bottom"), gp=gpar(fill=space_Background, col=space_Background))

    # Color in the space.
    grid.rect(gp=gpar(fill=chart_Fill, col=chart_Col))

    # Label plot.
    grid.text(.plot_Title, x=0.5, y=1.05, just=c("center","bottom"), rot=0, default.units="npc", gp=gpar(cex=title_CEX))

    # Draw box and label points.
    grid.polygon(x=.box_X, y=.box_Y, default.units="native", gp=gpar(fill=.colors[1], col=.colors[2]))
    grid.text(paste(.plot_Title, 1, sep=""), x=min(.box_X), y=min(.box_Y), default.units="native", just=c("right","top"), gp=gpar(cex=0.5))
    grid.text(paste(.plot_Title, 2, sep=""), x=max(.box_X), y=min(.box_Y), default.units="native", just=c("left","top"), gp=gpar(cex=0.5))

    # Finalize plot.
    popViewport()
}

Demo code:


# Draw twelve independent charts as
# a demo and connect with lines similar
# to a heiritage chart.
grid.newpage()

# Initialize a viewport to make our locations
# easier to map.
pushViewport(viewport(x=0, y=0, width=1, height=1, just=c("left","bottom"), xscale=c(0,10), yscale=c(0,4)))

# Color background of overall plot.
grid.rect(gp=gpar(fill=space_Background, col=space_Background))

# Store plot locations for convenience.
plot_Loc <- data.frame(x=c(2,4,6,8,1,3,7,9,2,4,6,8), y=c(3,3,3,3,2,2,2,2,1,1,1,1))

# Draw connecting lines.
connections <- data.frame(a=c(1, 3, 5, 6, 7, 1, 3, 5, 7, 6), b=c(2, 4, 6, 7, 8, 2, 4, 6, 8, 7), c=c(NA, NA, NA, NA, NA, 6, 7, 9, 12, 10), d=c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 11))
for (i in 1:nrow(connections)){
    if (is.na(connections$c[i])){
        grid.lines(x=plot_Loc$x[unlist(connections[i,1:2])], y=plot_Loc$y[unlist(connections[i,1:2])], default.units="native")
    } else if (is.na(connections$d[i])) {
        grid.lines(x=median(plot_Loc$x[unlist(connections[i,1:2])]), y=plot_Loc$y[unlist(connections[i,2:3])], default.units="native")
    } else {
        grid.lines(x=median(plot_Loc$x[unlist(connections[i,1:2])]), y=c(plot_Loc$y[connections[i,2]], median(plot_Loc$y[unlist(connections[i,2:3])])), default.units="native")
        grid.lines(x=plot_Loc$x[unlist(connections[i,3:4])], y=median(plot_Loc$y[unlist(connections[i,2:3])]), default.units="native")
        grid.lines(x=plot_Loc$x[connections[i,3]], y=c(median(plot_Loc$y[unlist(connections[i,2:3])]), plot_Loc$y[connections[i,3]]), default.units="native")
        grid.lines(x=plot_Loc$x[connections[i,4]], y=c(median(plot_Loc$y[unlist(connections[i,2:3])]), plot_Loc$y[connections[i,4]]), default.units="native")
    }
}


# Draw four independent line charts.
p <- 1
plot_Line_Chart(data.frame(time=1:1860, EuStockMarkets)[1:3], .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("dodgerblue", "deeppink"), "EU Stocks")
p <- 2
plot_Line_Chart(data.frame(time=1:1860, EuStockMarkets)[c(1,4,5)], .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("green", "purple"), "EU Stocks")
p <- 3
plot_Line_Chart(data.frame(time=1:(12*20), sunspots=sunspot.month[(171*12+1):(171*12+12*20)]), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("darkgoldenrod"), "Sunspots")
p <- 4
plot_Line_Chart(data.frame(time=1:(12*20), temp=nottem), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("red"), "Nottem")

# Draw four independent scatterplots.
p <- 5
plot_Scatterplot(data.frame(time=1:(1860 + 1 - 1000), DAX=rowMeans(embed(EuStockMarkets[,1], 1000)), FTSE=rowMeans(embed(EuStockMarkets[,4], 1000))), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("deeppink", "purple"), "Smooth")
p <- 6
plot_Scatterplot(data.frame(time=1:1860, EuStockMarkets)[c(1,2,5)], .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("deeppink", "purple"), "EU Stocks")
p <- 9
plot_Scatterplot(data.frame(time=1:(1860 + 1 - 20), DAX=rowMeans(embed(EuStockMarkets[,1], 20)), FTSE=rowMeans(embed(EuStockMarkets[,4], 20))), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("deeppink", "purple"), "Smooth*20")
p <- 10
plot_Scatterplot(data.frame(time=1:(1860 + 1 - 100), DAX=rowMeans(embed(EuStockMarkets[,1], 100)), FTSE=rowMeans(embed(EuStockMarkets[,4], 100))), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("deeppink", "purple"), "Smooth*100")


# Draw two independent histograms.
p <- 7
plot_Histogram(data.frame(time=1:(12*20), sunspots=sunspot.month[(171*12+1):(171*12+12*20)]), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("darkgoldenrod"), "Sunspots", breaks=6)
p <- 8
plot_Histogram(data.frame(time=1:(12*20), temp=nottem), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("red"), "Nottem", breaks=6)

# Draw sample objects in two charts spaces.
p <- 11
draw_Sample_Box(.x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .x_Scale=c(0,10), .y_Scale=c(-10,0), .justify=c("center","center"), .width=chart_Width, .height=chart_Height, .colors=c("dodgerblue","blue"), .box_X=c(4,6,6,4), .box_Y=c(-4,-4,-5,-5), .plot_Title="K")
p <- 12
draw_Sample_Box(.x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .x_Scale=c(-1,1), .y_Scale=c(0,1), .justify=c("center","center"), .width=chart_Width, .height=chart_Height, .colors=c("dodgerblue","blue"), .box_X=c(-0.5,0,0,-0.5), .box_Y=c(0.8,0.8,0.7,0.7), .plot_Title="L")
like image 193
Dinre Avatar answered Sep 20 '22 21:09

Dinre