Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assigning values in a sequence to a group of consecutive rows leaving some rows empty

I'm trying to group several consecutives rows (and assigning them the same value) while leaving some of the rows empty (when a certain condition is not fulfilled).

My data are locations (xy coordinates), the date/time at which they were measured, and the time span between measures. Somehow simplified, they look like this:

ID   X     Y      Time    Span
1    3445  7671   0:00    -
2    3312  7677   4:00    4
3    3309  7680   12:00   8
4    3299  7681   16:00   4
5    3243  7655   20:00   4
6    3222  7612   4:00    8
7    3260  7633   0:00    4
8    3254  7641   8:00    8
9    3230  7612   0:00    16
10   3203  7656   4:00    4
11   3202  7678   8:00    4
12   3159  7609   20:00   12
...

I'd like to assign a value to every sequence of locations that are measured within a time span of 4 hours, and make my data look like this:

ID   X     Y      Time    Span  Sequence
1    3445  7671   0:00    -     -
2    3312  7677   4:00    4     1
3    3309  7680   12:00   8     NA
4    3299  7681   16:00   4     2
5    3243  7655   20:00   4     2
6    3222  7612   4:00    8     NA
7    3260  7633   0:00    4     3
8    3254  7641   8:00    8     NA
9    3230  7612   0:00    16    NA
10   3203  7656   4:00    4     4
11   3202  7678   8:00    4     4
12   3159  7609   20:00   12    NA

I've tried several algorithms with a loop "for" plus "ifelse" condition like:

Sequence <- for (i in 1:max(ID)) {
ifelse (Span <= 4, i+1, "NA")
}

without any luck. I know my attempt is incorrect, but my programming skills are really basic and I haven't found any similar problem in the web.

Any ideas would be very appreciated!

like image 639
sblazquez Avatar asked Feb 18 '23 18:02

sblazquez


2 Answers

Here is a longish one liner:

ifelse(x <- DF$Span == 4, cumsum(c(head(x, 1), tail(x, -1) - head(x, -1) == 1)), NA)
# [1] NA  1 NA  2  2 NA  3 NA NA  4  4 NA

Explanation:

  • x is a vector of TRUE/FALSE showing where Span is 4.
  • tail(x, -1) is a safe way of writing x[2:length(x)]
  • head(x, -1) is a safe way of writing x[1:(length(x)-1)]
  • tail(x, -1) - head(x, -1) == 1 is a vector of TRUE/FALSE showing where we went from Span != 4 to Span == 4.
  • since the vector above is one element shorter than x, I prepended head(x, 1) in front of it. head(x, 1) is a safe way of writing x[1].
  • Then I take the cumsum so it converts the vector TRUE/FALSE into a vector of increasing integers: where Span jumps from !=4 to ==4 it increases by 1, otherwise stays constant.
  • Everything is wrapped into an ifelse so you only see numbers where x is TRUE, i.e., where Span == 4.
like image 69
flodel Avatar answered Feb 21 '23 16:02

flodel


Here's another alternative using rle and rep. We'll assume that your data.frame is named "test".

First, initialize your "Sequence" column, filling it with NA.

test$Sequence <- NA

Second, specify the condition that you are matching, in this case, test$Span == 4.

x <- test$Span == 4

Third, use the combination of rle's output (lengths and values) to get how many times each new run in the sequence occurs.

spanSeq <- rle(x)$lengths[rle(x)$values == TRUE]

Finally, use rep with the times argument set to the result obtained in step 3. Subset the required values of test$Sequence according to the index matched by test$Span == 4, and replace them with your new sequence.

test$Sequence[x] <- rep(seq_along(spanSeq), times = spanSeq)
test
#    ID    X    Y  Time Span Sequence
# 1   1 3445 7671  0:00    -       NA
# 2   2 3312 7677  4:00    4        1
# 3   3 3309 7680 12:00    8       NA
# 4   4 3299 7681 16:00    4        2
# 5   5 3243 7655 20:00    4        2
# 6   6 3222 7612  4:00    8       NA
# 7   7 3260 7633  0:00    4        3
# 8   8 3254 7641  8:00    8       NA
# 9   9 3230 7612  0:00   16       NA
# 10 10 3203 7656  4:00    4        4
# 11 11 3202 7678  8:00    4        4
# 12 12 3159 7609 20:00   12       NA

Once you understand the steps involved, you can also do this directly with within(). The following would give you the same result:

within(test, {
  Sequence <- NA
  spanSeq <- rle(Span == 4)$lengths[rle(Span == 4)$values == TRUE]
  Sequence[Span == 4] <- rep(seq_along(spanSeq), times = spanSeq)
  rm(spanSeq)
})
like image 22
A5C1D2H2I1M1N2O1R2T1 Avatar answered Feb 21 '23 14:02

A5C1D2H2I1M1N2O1R2T1