Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to restart a sequence based on values in another column OR reference the previous column's value in R

Tags:

r

I am trying to number in sequence locations gathered within a certain time period (those with time since previous location >60 seconds). I've eliminated columns irrelevant to this question, so example data looks like:

TimeSincePrev
1
1
1
1
511
1
2
286
1

My desired output looks like this: (sorry for the underscores, but I couldn't otherwise figure out how to get it to include my spaces to make the columns obvious...)

TimeSincePrev ___ NoInSeries
1 ________________ 1
1 ________________ 2
1 ________________ 3
1 ________________ 4
511 ______________ 1
1 ________________ 2
2 ________________ 3
286 ______________ 1
1 ________________ 2
...and so on for another 3500 lines

I have tried a couple of ways to approach this unsuccessfully:

First, I tried to do an ifelse, where I would make the NoInSequence 1 if the TimeSincePrev was more than a minute, or else the previous row's value +1..(In this case, I first insert a line number column to help me reference the previous row, but I suspect there is an easier way to do this?)

df$NoInSeries <- ifelse((dfTimeSincePrev > 60), 1, ((df[((df$LineNo)-1),"NoInSeries"])+1)).

I don't get any errors, but it only gives me the 1s where I want to restart sequences but does not fill in any of the other values:

TimeSincePrev ___ NoInSeries
1 ________________ NA
1 ________________ NA
1 ________________ NA
1 ________________ NA
511 ______________ 1
1 ________________ NA
2 ________________ NA
286 ______________ 1
1 ________________ NA
I assume this has something to do with trying to reference back to itself?

My other approach was to try to get it to do sequences of numbers (max 15), restarting every time there is a change in the TimeSincePrev value:

df$NoInSeries <- ave(df$TimeSincePrev, df$TimeSincePrev, FUN=function(y) 1:15)

I still get no errors but exactly the same output as before, with NAs in place and no other numbers filled in.

Thanks for any help!

like image 926
MPhD Avatar asked Oct 19 '14 07:10

MPhD


1 Answers

Using ave after creating a group detecting serie's change using (diff + cumsum)

dt$NoInSeries <- 
      ave(dt$TimeSincePrev,
          cumsum(dt$TimeSincePrev >60),
          FUN=seq)

The result is:

dt
# TimeSincePrev NoInSeries
# 1             1          1
# 2             1          2
# 3             1          3
# 4             1          4
# 5           511          1
# 6             1          2
# 7             2          3
# 8           286          1
# 9             1          2

add steps explanation:

## detect time change > 60 seconds 
## group value by the time change
(gg <- cumsum(dt$TimeSincePrev >60))
[1] 0 0 0 0 1 1 1 2 2

## get the sequence by group 
ave(dt$TimeSincePrev, gg, FUN=seq)
[1] 1 2 3 4 1 2 3 1 2
like image 91
agstudy Avatar answered Sep 18 '22 06:09

agstudy