Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

vertically distribute multiple lines with smart spacing

A common display of spectroscopic data (intensity vs wavelength) is used below to compare the position of peaks in the data across multiple spectra. Assuming they all share a baseline at 0, it is convenient to offset the multiple lines vertically by a constant spacing, to avoid the distraction of overlapping lines.

enter image description here

Thus becomes

enter image description here

I'm looking for a better strategy to perform this vertical shift automatically, starting from data in long format. Here is a minimal example.

# fake data (5 similar-looking spectra)
spec <- function(){
  x <- runif(100, 0, 100)
  data.frame(x=x, y=jitter(dnorm(x, mean=jitter(50), sd=jitter(5)), amount=0.01))
}
require(plyr)
all <- ldply(1:5, function(ii) data.frame(spec(), id=ii))

My current strategy is as follows:

  • convert the spectra from long format to wide format. This involves interpolation, as the spectra do not necessarily have identical x axis values.

  • find the minimum offset between spectra to avoid overlap between neighbours

  • shift the spectra by multiples of this distance

  • melt back to long format

I implemented this using plyr,

# function that evenly spaces the spectra to avoid overlap
# d is in long format, s is a scaling factor for the vertical shift
require(plyr); require(ggplot2)

spread_plot <- function(d, s=1){
  ranges <- ddply(d, "id", with, each(min,max,length)(x))
  common_x <- seq(max(ranges$min), min(ranges$max), length=max(ranges$length))
  new_y <- dlply(d, "id", function(x) approx(x$x, x$y, common_x)$y)
  mat <- do.call(cbind, new_y)
  test <- apply(mat, 1, diff)
  shift <- max(-test[test < 0])
  origins <- s*seq(0, by=shift, length=ncol(mat))

  for(ii in seq_along(origins)){
    current <- unique(d[["id"]])[ii]
    d[d[["id"]] == current, "y"] <- 
      d[d[["id"]] == current, "y"] + origins[ii]
  }
  d
}

test <- spread_plot(all)

ggplot(test, aes(x, y, colour=id, group=id))+
  geom_line() + guides(colour=guide_legend())

This strategy suffers from a few shortcomings:

  • it is slow

  • the offset is not a pretty number; I do not know how to automatically round it well so that spectra are offset e.g. by 0.02, or 50, etc. depending on the range of the intensities. pretty(origins) is problematic in that it can return a different number of values.

I feel I'm missing a simpler solution, perhaps working directly with the original data in long format.

like image 256
baptiste Avatar asked Nov 08 '13 17:11

baptiste


2 Answers

Interesting question.

Here's a possibility, offered without detailed comment, except to point out that it:

  • Should be very fast, due to a combo of its avoidance of plyr, use of data.table, and operation on data in its original long format.
  • Uses pretty() to pick a pretty offset.
  • Like your code, is not guaranteed to produce no intersections of lines, since overlap can happen between the lattice of points formed by common_x.

Here's the code

## Setup
library(data.table)
library(plyr)
library(ggplot2)

spec <- function(){
  x <- runif(100, 0, 100)
  data.frame(x=x, y=jitter(dnorm(x, mean=jitter(50), sd=jitter(5)), amount=0.01))
}
all <- ldply(1:5, function(ii) data.frame(spec(), id=ii))

## Function that uses data.table rather than plyr to compute and add offsets
spread_plot <- function(d, s=1){
    d <- data.table(d, key="id")
    ranges <- d[, list(min=min(x), max=max(x), length=length(x)),by="id"]
    common_x <- seq(max(ranges$min), min(ranges$max), length=max(ranges$length))
    new_y <- d[,list(y=approx(x, y, common_x)$y, N=seq_along(common_x)),
               by="id"]
    shift <- max(new_y[, max(abs(diff(y))), by = "N"][[2]])
    shift <- pretty(c(0, shift), n=0)[2]
    origins <- s*seq(0, by=shift, length=length(unique(d$id)))
    d[,y:=(y + origins[.GRP]),by="id"]
    d
}

## Try it out
test <- spread_plot(all)
ggplot(test, aes(x, y, colour=id, group=id))+
  geom_line() + guides(colour=guide_legend())

enter image description here

like image 127
Josh O'Brien Avatar answered Nov 20 '22 21:11

Josh O'Brien


I still think you could rely on some assumptions about typical data from spectroscopy. Usually, x values are sorted, the number of them is equal for all spectra and they are quite similar:

# new fake data (5 similar-looking spectra)
spec <- function(){
  x <- jitter(seq(0,100,1),0.1)
  data.frame(x=x, y=jitter(dnorm(x, mean=jitter(50), sd=jitter(5)), amount=0.01))
}
require(plyr)
all <- ldply(1:5, function(ii) data.frame(spec(), id=ii))

If these assumptions are valid, you could treat the spectra as having identical x values:

library(ggplot2)
spread_plot  <- function(d, s=0.05) {
  #add some checks here, e.g., for equal length 
  d <- d[order(d$x),]
  d$id <- factor(d$id)
  l <- levels(d$id)
  pretty_offset <- pretty(s*min(tapply(d$y, d$id, function(x) abs(diff(range(x))))))[2]

  for (i in seq_len(length(l)-1)+1) {
      mean_delta_y <- mean(d[d$id == l[i], "y"] - d[d$id == l[i-1], "y"])
      d[d$id == l[i], "y"] <-  d[d$id == l[i], "y"] - mean_delta_y
      min_delta_y <- abs(1.05 * min(d[d$id == l[i], "y"] - d[d$id == l[i-1], "y"]))
      pretty_delta_y <- max(min_delta_y, pretty_offset)
      d[d$id == l[i], "y"] <-  d[d$id == l[i], "y"] + pretty_delta_y
      }
  p <- ggplot(d, aes(x=x, y=y, col=id)) + geom_line()
  print(p)
}
spread_plot(all, s=0)

enter image description here

spread_plot(all, s=0.5)

enter image description here

like image 29
Roland Avatar answered Nov 20 '22 22:11

Roland