A common display of spectroscopic data (intensity vs wavelength) is used below to compare the position of peaks in the data across multiple spectra. Assuming they all share a baseline at 0, it is convenient to offset the multiple lines vertically by a constant spacing, to avoid the distraction of overlapping lines.
Thus becomes
I'm looking for a better strategy to perform this vertical shift automatically, starting from data in long format. Here is a minimal example.
# fake data (5 similar-looking spectra)
spec <- function(){
x <- runif(100, 0, 100)
data.frame(x=x, y=jitter(dnorm(x, mean=jitter(50), sd=jitter(5)), amount=0.01))
}
require(plyr)
all <- ldply(1:5, function(ii) data.frame(spec(), id=ii))
My current strategy is as follows:
convert the spectra from long format to wide format. This involves interpolation, as the spectra do not necessarily have identical x axis values.
find the minimum offset between spectra to avoid overlap between neighbours
shift the spectra by multiples of this distance
melt back to long format
I implemented this using plyr,
# function that evenly spaces the spectra to avoid overlap
# d is in long format, s is a scaling factor for the vertical shift
require(plyr); require(ggplot2)
spread_plot <- function(d, s=1){
ranges <- ddply(d, "id", with, each(min,max,length)(x))
common_x <- seq(max(ranges$min), min(ranges$max), length=max(ranges$length))
new_y <- dlply(d, "id", function(x) approx(x$x, x$y, common_x)$y)
mat <- do.call(cbind, new_y)
test <- apply(mat, 1, diff)
shift <- max(-test[test < 0])
origins <- s*seq(0, by=shift, length=ncol(mat))
for(ii in seq_along(origins)){
current <- unique(d[["id"]])[ii]
d[d[["id"]] == current, "y"] <-
d[d[["id"]] == current, "y"] + origins[ii]
}
d
}
test <- spread_plot(all)
ggplot(test, aes(x, y, colour=id, group=id))+
geom_line() + guides(colour=guide_legend())
This strategy suffers from a few shortcomings:
it is slow
the offset is not a pretty number; I do not know how to automatically round it well so that spectra are offset e.g. by 0.02, or 50, etc. depending on the range of the intensities. pretty(origins)
is problematic in that it can return a different number of values.
I feel I'm missing a simpler solution, perhaps working directly with the original data in long format.
Interesting question.
Here's a possibility, offered without detailed comment, except to point out that it:
pretty()
to pick a pretty offset.common_x
.Here's the code
## Setup
library(data.table)
library(plyr)
library(ggplot2)
spec <- function(){
x <- runif(100, 0, 100)
data.frame(x=x, y=jitter(dnorm(x, mean=jitter(50), sd=jitter(5)), amount=0.01))
}
all <- ldply(1:5, function(ii) data.frame(spec(), id=ii))
## Function that uses data.table rather than plyr to compute and add offsets
spread_plot <- function(d, s=1){
d <- data.table(d, key="id")
ranges <- d[, list(min=min(x), max=max(x), length=length(x)),by="id"]
common_x <- seq(max(ranges$min), min(ranges$max), length=max(ranges$length))
new_y <- d[,list(y=approx(x, y, common_x)$y, N=seq_along(common_x)),
by="id"]
shift <- max(new_y[, max(abs(diff(y))), by = "N"][[2]])
shift <- pretty(c(0, shift), n=0)[2]
origins <- s*seq(0, by=shift, length=length(unique(d$id)))
d[,y:=(y + origins[.GRP]),by="id"]
d
}
## Try it out
test <- spread_plot(all)
ggplot(test, aes(x, y, colour=id, group=id))+
geom_line() + guides(colour=guide_legend())
I still think you could rely on some assumptions about typical data from spectroscopy. Usually, x values are sorted, the number of them is equal for all spectra and they are quite similar:
# new fake data (5 similar-looking spectra)
spec <- function(){
x <- jitter(seq(0,100,1),0.1)
data.frame(x=x, y=jitter(dnorm(x, mean=jitter(50), sd=jitter(5)), amount=0.01))
}
require(plyr)
all <- ldply(1:5, function(ii) data.frame(spec(), id=ii))
If these assumptions are valid, you could treat the spectra as having identical x values:
library(ggplot2)
spread_plot <- function(d, s=0.05) {
#add some checks here, e.g., for equal length
d <- d[order(d$x),]
d$id <- factor(d$id)
l <- levels(d$id)
pretty_offset <- pretty(s*min(tapply(d$y, d$id, function(x) abs(diff(range(x))))))[2]
for (i in seq_len(length(l)-1)+1) {
mean_delta_y <- mean(d[d$id == l[i], "y"] - d[d$id == l[i-1], "y"])
d[d$id == l[i], "y"] <- d[d$id == l[i], "y"] - mean_delta_y
min_delta_y <- abs(1.05 * min(d[d$id == l[i], "y"] - d[d$id == l[i-1], "y"]))
pretty_delta_y <- max(min_delta_y, pretty_offset)
d[d$id == l[i], "y"] <- d[d$id == l[i], "y"] + pretty_delta_y
}
p <- ggplot(d, aes(x=x, y=y, col=id)) + geom_line()
print(p)
}
spread_plot(all, s=0)
spread_plot(all, s=0.5)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With