I am using R with package data.table and I would like to group a data.table by running (time) intervals or overlapping bins. For each of these running intervals I would like to find the occurence of equal pairs of data. Further more these "equal pairs of data" should be not exactly equal, but in some interval range, too.
The simple version of the question is as following:
#Time X Y Counts
# ... ... ... 1
#I would like to do:
DT[, sum(counts), by = list(Time, X, Y)]
#with Time, X and Y being in overlapping intervals.
findintervals()
would give me bins with "hard borders", not overlapping ones.
The problem in more detail: Let's say I have a data.table like that:
Time <- c(1,1,2,4,5,5,6,7,8,8,8,8,12,13)
#more equal time values are allowed.
X <- c(6,6,7,10,5,7,6,3,9,10,6,3,3,6)
Y <- c(2,6,10,3,4,6,6,9,4,9,6,6,9,9)
DT <- data.table(Time, X, Y)
Time X Y
1: 1 6 2
2: 1 6 6
3: 2 7 10
4: 4 10 3
5: 5 5 4
6: 5 7 6
7: 6 6 6
8: 7 3 9
9: 8 9 4
10: 8 10 9
11: 8 6 6
12: 8 3 6
13: 12 3 9
14: 13 6 9
And some predefined interval sizes:
Timeinterval <- 5
#for a time value of 10 this means to look from 10-5 to 10+5
RangeX.percentage <- 0.5
RangeY.percentage <- 0.5
The result should give me an additional column, let's call it "counts" with the occurence of equal pairs of data X and Y considering the ranges for X and Y.
I thought about some kind of grouping by time intervals like
c(1, 1, 2, 4, 5, 5, 6) #for the first item: (1-5):(1+5)
c(1, 1, 2, 4, 5, 5, 6, 7) # for the second item: (1-5):(1+5)
c(1, 1, 2, 4, 5, 5, 6, 7, 8, 8, 8, 8) #for the third item (2-5):(2+5)
#...
c(8, 8, 8, 8, 12, 13) # for the last item (13-5):(13+5)
and the following conditions for the data (but maybe there is a simpler version for that part too):
EDIT: To clearify what the result should look like:
Ranges <- DT[ , list(
X* (1 + RangeX.percentage), X* (1 - RangeX.percentage),
Y* (1 + RangeY.percentage), Y* (1 - RangeY.percentage))]
DT2 <- cbind(DT, Ranges, count = rep(1, nrow(DT)))
setnames(DT2, c("Time","X","Y","XR1","XR2","YR1","YR2","count"))
for (i in 1:nrow(DT2)){
#main part of the question how to get this done within data.table:
DT2.subset <- DT2[which(abs(Time - DT2[i]$Time) < Timeinterval)]
#subsequent comparison of X and Y:
DT[i]$Count<- length(which(DT2.subset$X < DT2[i]$XR1 &
DT2.subset$X > DT2[i]$XR2 &
DT2.subset$Y < DT2[i]$YR1 &
DT2.subset$Y > DT2[i]$YR2))
}
DT2
Time X Y XR1 XR2 YR1 YR2 count
1: 1 6 2 9.0 3.0 3.0 1.0 1
2: 1 6 6 9.0 3.0 9.0 3.0 3
3: 2 7 10 10.5 3.5 15.0 5.0 4
4: 4 10 3 15.0 5.0 4.5 1.5 3
5: 5 5 4 7.5 2.5 6.0 2.0 1
6: 5 7 6 10.5 3.5 9.0 3.0 6
7: 6 6 6 9.0 3.0 9.0 3.0 4
8: 7 3 9 4.5 1.5 13.5 4.5 2
9: 8 9 4 13.5 4.5 6.0 2.0 3
10: 8 10 9 15.0 5.0 13.5 4.5 4
11: 8 6 6 9.0 3.0 9.0 3.0 4
12: 8 3 6 4.5 1.5 9.0 3.0 1
13: 12 3 9 4.5 1.5 13.5 4.5 2
14: 13 6 9 9.0 3.0 13.5 4.5 1
As my complete data.table contains more than a million rows, checking all DT$time for each row is a mess in terms of computation time.
You could try data.table::foverlaps
.
We will create Ranges
pretty much as you did, just with addition for Time
ranges and a row index (for later aggregation). The main issue here is that you don't want <= or >= rather < and >, so we will have to add +-1 to the Time
intervals. Then, we will add a Time
interval to DT
too, key, and run foverlaps
. The final stage is to count observation per row.
DT[, Time2 := Time] ## Add higher interval to DT
setkey(DT, Time, Time2) ## key (for foverlaps)
Ranges <-
DT[ , .(Time = Time - Timeinterval + 1, ## Add lower Time interval
Time2 = Time + Timeinterval - 1, ## Add higher Time interval
XR1 = X* (1 - RangeX.percentage),
XR2 = X* (1 + RangeX.percentage),
YR1 = Y* (1 - RangeY.percentage),
YR2 = Y* (1 + RangeY.percentage),
indx = .I)] ## Add row index
# Run foverlaps and count incidences by condition while updating DT by reference
DT[,
count := foverlaps(Ranges, DT)[X > XR1 & X < XR2 & Y > YR1 & Y < YR2,
.N,
keyby = indx]$N]
DT
# Time X Y Time2 count
# 1: 1 6 2 1 1
# 2: 1 6 6 1 3
# 3: 2 7 10 2 4
# 4: 4 10 3 4 3
# 5: 5 5 4 5 1
# 6: 5 7 6 5 6
# 7: 6 6 6 6 4
# 8: 7 3 9 7 2
# 9: 8 9 4 8 3
# 10: 8 10 9 8 4
# 11: 8 6 6 8 4
# 12: 8 3 6 8 1
# 13: 12 3 9 12 2
# 14: 13 6 9 13 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With