I'm analysing temporal patterns in a complex data set consisting of several environmental variables as well as activity data from various animal species. These data have been collected by multiple experimental setups, and data from each setup have been stored once per minute. The project has been running for several years now, so my data set is rather large.
The first few lines of one of my datasets look like this:
> head(setup_01)
DateTime Film_number unused PIR Wheel Temperature LightOld LightDay LightNight LightUV IDnumbers error mouse shrew vole rat frog rest extra_info odour
1 2015-03-10 12:27:10 x 0 0 13.40 1471.34 -0.97 1331.29 700.42 no error 0 0 0 0 0 0 1
2 2015-03-10 12:28:10 x 0 0 13.43 1471.38 -1.07 1291.11 731.32 no error 0 0 0 0 0 0 1
3 2015-03-10 12:29:10 x 0 0 13.31 1471.24 -1.08 1368.57 1016.02 no error 0 0 0 0 0 0 1
As I want to relate these variables to various natural cycles like sunrise and sunset throughout the seasons, I have made use of the package maptools
to calculate sunrise and sunset times
library(maptools)
gpclibPermit()
#set coordinates
crds=c(4.4900,52.1610)
# download the sunrise/sunset/etc data
setup_01$sunrise=sunriset(matrix(crds,nrow=1),dateTime=as.POSIXct(setup_01$DateTime),POSIXct.out=TRUE,direction="sunrise")
setup_01$sunset=sunriset(matrix(crds,nrow=1),dateTime=as.POSIXct(setup_01$DateTime),POSIXct.out=TRUE,direction="sunset")
#create a variable that's 0 except at sunrise, and one that's 0 except at sunset
setup_01$sunrise_act=0
setup_01$sunset_act=0
setup_01[abs(unclass(setup_01[,"DateTime"])-unclass(setup_01[,"sunrise"]$time))<30,]$sunrise_act=1
setup_01[abs(unclass(setup_01[,"DateTime"])-unclass(setup_01[,"sunset"]$time))<30,]$sunset_act=1
As the behaviour of most animals differs, depending on whether it is day or night, I have used sunset/sunrise times to data to calculate a new variable that's 0 at night and 1 at daytime:
#create a variable that's 0 at night and 1 at daytime
setup_01$daytime=0
setup_01[setup_01[,"DateTime"]>setup_01[,"sunrise"]$time & setup_01[,"DateTime"]<setup_01[,"sunset"]$time,]$daytime=1
So far, so good... it's even possible with maptools
to use the start of civil/nautical/astronomical dusk and dawn instead of sunrise and sunset.
This, however, is where my problem starts. I want to number all the days in my experiment. And rather than increasing the day counter at midnight, as is usual and easy to do, I want to increase the day counter at sunset (or possibly in future experiments another moveable time of day like sunrise, nautical dusk and dawn,...). As sunset does not happen at the same time every day, it is not - for me - a straightforward problem to solve.
I've only come up with a for
-loop, which is not a nice way of doing things. Also, given that I have more than 6 years worth of data points collected once a minute in several setups, I can go sit and watch the tectonic plates move while R runs through a whole bunch of loops like these:
setup_01$day=0
day<-1
for(i in 1:nrow(setup_01)){
setup_01[i,]$day<-day
if(setup_01[i,]$sunset_act==1){
day<-day+1
}
}
Apart from being ugly and slow, this code has one big problem: it does not deal with missing values. Sometimes, due to equipment failure, data have not been recorded at all for hours or days. If no data have been recorded during a sunset, the above code does not increase the day counter. This means I need to - somehow - incorporate the date/time codes as well. It is easy to create a variable of days since the start of the experiment:
setup_01$daynumber<-as.integer(ceiling(difftime(setup_01$DateTime, setup_01$DateTime[1], units = "days")))
Perhaps these numbers can be used, possibly in conjunction with Heroka's nice rle
-algorithm.
I have used dput
to make a few months worth of data from one setup, including a few large chunks of missing data, as well as the newly created variables (as described in this post and in Heroka's answer) available here.
I've looked for something better, nicer and especially faster, but have been unable to come up with a nice trick. I've fiddled with subsetting my dataframe, but come to the conclusion that it is probably a silly approach. I've looked at maptools
, lubridate
, and GeoLight
. I've searched Google, Stack Overflow and various books, like Hadley Wickham's fantastic Advanced R. All to no avail. Perhaps I'm missing something obvious though. I'm hoping someone here can help me.
I came up with a solution on generated 0's and 1's (as you have already generated those), and it works with runlengths.
#sunset/sunrise is series of 0's and 1's indicating night and daytime, so solution that works for random sequence
#will work for OP's dataset
set.seed(10)
sunset <- c(1,rbinom(20,1,0.5))
#counter needs to be x for sequence of 11111 (day) and 0000(night), and then increase when 0 reappears
#counter starts at 1
#intermediate step: number each half-day
rle_sunset <- rle(sunset)
period <- rep(1:length(rle_sunset$lengths),rle_sunset$lengths)
#calculate day so that each two subsequent periods are one day
day <- ceiling(period/2)
> cbind(sunset,period,day)
sunset period day
[1,] 1 1 1
[2,] 1 1 1
[3,] 0 2 1
[4,] 0 2 1
[5,] 1 3 2
[6,] 0 4 2
[7,] 0 4 2
[8,] 0 4 2
[9,] 0 4 2
[10,] 1 5 3
[11,] 0 6 3
[12,] 1 7 4
[13,] 1 7 4
[14,] 0 8 4
[15,] 1 9 5
[16,] 0 10 5
[17,] 0 10 5
[18,] 0 10 5
[19,] 0 10 5
[20,] 0 10 5
[21,] 1 11 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With