Merge overlapping ranges into unique groups, in dataframe

Question

I have a dataframe of n rows and 3

df <- data.frame(start=c(178,400,983,1932,33653),
    end=c(5025,5025, 5535, 6918, 38197),
    group=c(1,1,2,2,3))

df
  start   end group
1   178  5025     1
2   400  5025     1
3   983  5535     2
4  1932  6918     2
5 33653 38197     3

I would like to make a new column df$group2 that re-classifies groups that overlap to be the same. For example, df$group[df$group==1] starts at 178 and ends at 5025. This overlaps with df$group[df$group==2], which starts at 983 and ends at 6918. I would like to make a new column that now classifies group 1 and 2 as group 1 (and subsequently, group 3 as group 2).

Result:

df
  start   end group group2
1   178  5025     1      1
2   400  5025     1      1
3   983  5535     2      1
4  1932  6918     2      1
5 33653 38197     3      2

Thanks in advance for any help.

Michael · Accepted Answer

I think this is possible with data.table::foverlaps:

library(data.table)
setDT(df)
setkey(df,start,end)
df[,row_id:=1:nrow(df)]

temp <- foverlaps(df,df)
temp[, `:=`(c("start","end"),list(min(start,i.start),max(end,i.end))),by=row_id]
temp[, `:=`(c("start","end"),list(min(start,i.start),max(end,i.end))),by=i.row_id]
temp2 <- temp[, list(group2=.GRP, row_id=unique(c(row_id,i.row_id))),by=.(start,end)][,.(row_id,group2)]

setkey(df,row_id)
setkey(temp2,row_id)
temp2[df]

Arun · Answer

You'll need IRanges package:

require(IRanges)
ir <- IRanges(df$start, df$end)
df$group2 <- subjectHits(findOverlaps(ir, reduce(ir)))
> df

#  start   end group group2
# 1   178  5025     1      1
# 2   400  5025     1      1
# 3   983  5535     2      1
# 4  1932  6918     2      1
# 5 33653 38197     3      2

To install IRanges, type these lines in R:

source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")

To learn more (manual etc..) go here

Merge overlapping ranges into unique groups, in dataframe

Tags:

dataframe

range

r

data.table

overlap

user1560292

2 Answers

Michael

Arun

Recent Activity

Donate For Us

Merge overlapping ranges into unique groups, in dataframe

Tags:

dataframe

range

r

data.table

overlap

user1560292

2 Answers

Michael

Arun

Related questions

Recent Activity

Donate For Us