I have a dataset like the one shown below (15 years of data) of half hourly interval data and the occurrence of an event (1 meaning it happened and 0 that it did not)
Date Event
2002-04-27 19:30:00 0
2002-04-27 20:00:00 0
2002-04-27 20:30:00 0
2002-04-27 21:00:00 0
2002-04-27 21:30:00 1
2002-04-27 22:00:00 1
2002-04-27 22:30:00 0
2002-04-27 23:00:00 0
2002-04-27 23:30:00 1
2002-04-28 00:00:00 1
2002-04-28 00:30:00 1
2002-04-28 01:00:00 1
2002-04-28 01:30:00 0
2002-04-28 02:00:00 0
2002-04-28 02:30:00 0
2002-04-28 03:00:00 0
2002-04-28 03:30:00 0
2002-04-28 04:00:00 0
2002-04-28 04:30:00 0
2002-04-28 05:00:00 0
2002-04-28 05:30:00 0
2002-04-28 06:00:00 0
2002-04-28 06:30:00 0
2002-04-28 07:00:00 0
What I would like to do is to calculate for each day (e.g 2002-04-27
) the number of events that occurred. However, consecutive 1's mean that it is just one event and also 1's crossing a day, say for example 2002-04-27 21:30:00
has a 1, and so is 2002-04-28 00:00:00
but that would be considered as 1 event only that occurred on the 2002-04-27
. An output like something below would be ideal.
Date No_Event
2002-04-27 2
2002-04-28 0
So, how would I go about doing this? Any help is very much appreciated.
Using lubridate
(for group by day) and data.table
library(data.table)
library(lubridate)
setDT(df)
df[Event!=shift(Event, fill=0), sum(Event), by=floor_date(Date, unit="day")]
# floor_date V1
#1: 2002-04-27 2
#2: 2002-04-28 0
df
used in above example
df <- data.frame(Date=seq(as.POSIXct("2002-04-27 19:30:00 ", tz="GMT"), as.POSIXct("2002-04-28 07:00:00 ", tz="GMT"), by="30 min"),
Event=c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With