Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping sequential values with "gap tolerance"

Tags:

r

I have this df:

FRAME   TRACK_ID   SUM   TC_17
1       15         0     1
2       15         0     1
3       15         0     1
4       15         0     -1
5       15         0     1
6       15         0     1
7       15         0     -1
8       15         0     -1
9       15         0     1
10      15         0     1

Now I am using this code to get the Frames at which TC_17 has the value 1:

for (i in 1:length(IDs)) {
  temp <- get(paste("TRACK_", IDs[i], sep = ""))
  temp3 <- paste("TRACK_", IDs[i], sep = "")
  if (ncol(temp)==3) {
    print(paste("No contacts detected for Track", IDs[i]))
    next
  }
  for (j in 4:ncol(temp)) {
    contact <- which(temp[,j] == 1) - 1
    contact <- sort(contact)
    Contact_No <- cumsum(c(1, abs(contact[-length(contact)] - contact[-1]) > 1))
    temp2 <- by(contact, Contact_No, identity)
  }
  assign(paste(temp3, colnames(temp)[j], sep = "_"), temp2)
}

This returns the list TRACK_15_TC_17:

Contact_No: 1
[1] 1 2 3
-------------------------------------------------------------------------------- 
Contact_No: 2
[1] 5 6
-------------------------------------------------------------------------------- 
Contact_No: 3
[1] 9 10

So far so good, but I want this code to be able to include some sort of gap tolerance of 1 frame. So that the final list looks something like this:

Contact_No: 1
[1] 1 2 3 5 6 
-------------------------------------------------------------------------------- 
Contact_No: 2
[1] 9 10

Contact_No 1 and former Contact_No 2 have been merged together because there is only a gap of size 1 between the last value of Contact_No 1 and the first value of former Contact_No 2. I have tried something along the lines of:

for (k in 1:length(temp2)) {
  if (k+1>length(temp2)) {
    next
  }
  if ((temp2[[k]][length(temp2[[k]])])-(temp2[[k+1]][1])<=1 & (k+1) < length(temp2)) {
    ListTemp <- c(temp2[[k]][length(temp2[[k]])], temp2[[k+1]])
    print(ListTemp)
  }
}

However, this does not seem to work. If someone could help me with that I would be very grateful! (I'm also open to completely different approaches)

like image 237
eFFecX Avatar asked Nov 14 '19 12:11

eFFecX


2 Answers

One way is to use rle, find where there is only 1 -1, replace that with one, get new values using rep and split based on the diff not being 1 (i.e consecutive values), i.e.

i1 <- rle(df$TC_17)

#Run Length Encoding
#  lengths: int [1:5] 3 1 2 2 2
#  values : int [1:5] 1 -1 1 -1 1

i1$values[which(i1$lengths == 1 & i1$values == -1)] <- 1

#Run Length Encoding
#  lengths: int [1:5] 3 1 2 2 2
#  values : num [1:5] 1 1 1 -1 1

i2 <- which(rep(i1$values, i1$lengths) == 1)
#[1]  1  2  3  4  5  6  9 10


split(i2, cumsum(c(TRUE, diff(i2) != 1)))
#$`1`
#[1] 1 2 3 4 5 6

#$`2`
#[1]  9 10
like image 83
Sotos Avatar answered Oct 25 '22 17:10

Sotos


In your case, the gap value is 2. You can change the gap by replacing the value in the second line of code.

ind <- which(df$TC_17 == 1)
split(ind, cumsum(c(TRUE, diff(ind) > 2)))

# $`1`
# [1] 1 2 3 5 6
# 
# $`2`
# [1]  9 10
like image 40
Darren Tsai Avatar answered Oct 25 '22 15:10

Darren Tsai