Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Peak detection in Manhattan plot

Tags:

r

The attached plot (Manhattan plot) contains on the x axis chromosome positions from the genome and on the Y axis -log(p), where p is a p-value associated with the points (variants) from that specific position. enter image description here

I have used the following R code to generate it (from the gap package) :

require(gap)
affy <-c(40220, 41400, 33801, 32334, 32056, 31470, 25835, 27457, 22864, 28501, 26273, 
     24954, 19188, 15721, 14356, 15309, 11281, 14881, 6399, 12400, 7125, 6207)
CM <- cumsum(affy)
n.markers <- sum(affy)
n.chr <- length(affy)
test <- data.frame(chr=rep(1:n.chr,affy),pos=1:n.markers,p=runif(n.markers))
oldpar <- par()
par(cex=0.6)
colors <- c("red","blue","green","cyan","yellow","gray","magenta","red","blue","green",          "cyan","yellow","gray","magenta","red","blue","green","cyan","yellow","gray","magenta","red")
mhtplot(test,control=mht.control(colors=colors),pch=19,bg=colors)
> head(test)
  chr pos          p
1   1   1 0.79296584
2   1   2 0.96675136
3   1   3 0.43870076
4   1   4 0.79825513
5   1   5 0.87554143
6   1   6 0.01207523

I am interested in getting the coordinates of the peaks of the plot above a certain threshold (-log(p)) .

like image 434
agatha Avatar asked Nov 13 '22 10:11

agatha


1 Answers

If you want the indices of the values above the 99th percentile:

# Add new column with log values
test = transform(test, log_p = -log10(test[["p"]]))
# Get the 99th percentile
pct99 = quantile(test[["log_p"]], 0.99)

...and get the values from the original data test:

peaks = test[test[["log_p"]] > pct99,]
> head(peaks)
    chr pos           p    log_p
5     1   5 0.002798126 2.553133
135   1 135 0.003077302 2.511830
211   1 211 0.003174833 2.498279
586   1 586 0.005766859 2.239061
598   1 598 0.008864987 2.052322
790   1 790 0.001284629 2.891222

You can use this with any threshold. Note that I have not calculated the first derivative, see this question for some pointers:

How to calculate first derivative of time series

after calculating the first derivative, you can find the peaks by looking at points in the timeseries where the first derivative is (almost) zero. After identifying these peaks, you can check which ones are above the threshold.

like image 106
Paul Hiemstra Avatar answered Nov 15 '22 11:11

Paul Hiemstra