Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R overlap multiple GRanges with findOverlaps()

I have three tables with differing genomic intervals. Here is an example:

> a
   chr interval.start interval.end names
1 chr1              5           10     a
2 chr1              6           10     b
3 chr2              7           10     c
4 chr3              8           10     d

> b
   chr interval.start interval.end names
1 chr1              6           15     e
2 chr1              7           15     f
3 chr1              8           15     g

> c
   chr interval.start interval.end names
1 chr1              7           12     h
2 chr1              8           12     i
3 chr5              9           12     j
4 chr10             10          12     k
5 chr20             11          12     l

I am trying to find the common intervals between all tables after converting info to GRanges. Essentially I want to do something like intersect(c,intersect(a,b)). However, because I am using genomic coordinates, I have to do this with GRanges and GenomicRanges package, which I am not familiar with.

I can do findOverlaps(gr, gr1) or findOverlaps(gr1, gr2), but is there an easy way to overlap multiple GRanges at once like findOverlaps(gr, gr1, gr2)?

Any help would be appreciated. If this question was asked elsewhere, I apologize in advance.

Thanks

like image 581
user2804480 Avatar asked Apr 28 '14 02:04

user2804480


2 Answers

You can subset one of them using the subsetByOverlaps result of one pairwise comparison then use that subset to compare to the third set.

Sub1 <- subsetByOverlaps(gr,gr1)
Sub2 <- subsetByOverlaps(sub1,gr2)

Or directly

Reduce(subsetByOverlaps, list(gr, gr1, gr2))

resulting in the subset of the GRanges object that overlap in all 3 GRanges objects

Depending on the type of overlap you want and which has the largest ranges, you should consider which to use as the query and which the subject.

like image 99
JeremyS Avatar answered Sep 24 '22 20:09

JeremyS


Following works for getting the exact intersects between all the ranges.

Reduce(intersect, list(gr, gr1, gr2))

In:

Reduce(subsetByOverlaps, list(gr, gr1, gr2))

subsetByOverlaps takes the first granges object as the query (first object in parentheses, here gr) and returns the coordiantes in the query (gr) that overlaps with at least one element in the subjects (gr1, gr2). So to find common intervals (regions of intersection), intersect is a the appropriate function.

like image 24
user1938965 Avatar answered Sep 20 '22 20:09

user1938965