I have three tables with differing genomic intervals. Here is an example:
> a
chr interval.start interval.end names
1 chr1 5 10 a
2 chr1 6 10 b
3 chr2 7 10 c
4 chr3 8 10 d
> b
chr interval.start interval.end names
1 chr1 6 15 e
2 chr1 7 15 f
3 chr1 8 15 g
> c
chr interval.start interval.end names
1 chr1 7 12 h
2 chr1 8 12 i
3 chr5 9 12 j
4 chr10 10 12 k
5 chr20 11 12 l
I am trying to find the common intervals between all tables after converting info to GRanges. Essentially I want to do something like intersect(c,intersect(a,b)). However, because I am using genomic coordinates, I have to do this with GRanges and GenomicRanges package, which I am not familiar with.
I can do findOverlaps(gr, gr1) or findOverlaps(gr1, gr2), but is there an easy way to overlap multiple GRanges at once like findOverlaps(gr, gr1, gr2)?
Any help would be appreciated. If this question was asked elsewhere, I apologize in advance.
Thanks
You can subset one of them using the subsetByOverlaps result of one pairwise comparison then use that subset to compare to the third set.
Sub1 <- subsetByOverlaps(gr,gr1)
Sub2 <- subsetByOverlaps(sub1,gr2)
Or directly
Reduce(subsetByOverlaps, list(gr, gr1, gr2))
resulting in the subset of the GRanges object that overlap in all 3 GRanges objects
Depending on the type of overlap you want and which has the largest ranges, you should consider which to use as the query and which the subject.
Following works for getting the exact intersects between all the ranges.
Reduce(intersect, list(gr, gr1, gr2))
In:
Reduce(subsetByOverlaps, list(gr, gr1, gr2))
subsetByOverlaps takes the first granges object as the query (first object in parentheses, here gr) and returns the coordiantes in the query (gr) that overlaps with at least one element in the subjects (gr1, gr2). So to find common intervals (regions of intersection), intersect is a the appropriate function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With