I hope this is an acceptable R/data.table problem.
I have a 3-column table with:
id geographic location IDs (303,453 locations)month month over 25 years 1990-2014spei a climatic index that varies between -7 and 7.I need to count the occurrence of droughts at each location over the entire 1990-2014 period. A drought event is defined as "a period in which the SPEI is continuously negative and the SPEI reaches a value of -1.0 or less. Drought starts when the SPEI first falls below zero and ends with the first positive SPEI value following a value of -1.0 or less".
I know this should be feasible using shift() and rolling joins but would very welcome some help!
# Sample table structure
dt <- data.table(
id = rep(1:303453, each=25*12),
month = rep(seq(as.Date("1990-01-01"), as.Date("2014-12-31"), "month"), 303453),
spei = runif(303453*25*12, -7, 7))
# A minimal example with 1 location over 12 months
library(data.table)
library(xts)
dt <- data.table(
id = rep("loc1", each=12),
month = seq(as.Date("2014-01-01"), as.Date("2014-12-31"), "month"),
spei = c(-2, -1.1, -0.5, 1.2, -1.2, 2.3, -1.7, -2.1, 0.9, 1.2, -0.9, -0.2))
spei.ts <- xts(dt$spei, order.by=dt$month, frequency="month")
plot(spei.ts, type="bars")

This shows 3 drought events over a 1-year period. This is what I need to identify and count.
Hoping some of you are more used to working with time series. Many thanks, --Mel.
Here is a starting point to get the result you want. Probably experts can suggest improvements in speed.
EDIT: improved speed ~8x by removing paste.
library(data.table)
set.seed(42)
n <- 300 # 303453 will be ~1000 times slower
dt <- data.table(
id = rep(1:n, each=25*12),
month = rep(seq(as.Date("1990-01-01"), as.Date("2014-12-31"), "month"), n),
spei = runif(n*25*12, -7, 7))
system.time({
dt[, `:=`(neg = (spei < 0), neg1 = (spei <= -1))]
dt[, runid := ifelse(neg, rleid(neg), NA)]
res <- dt[!is.na(runid),
.(length = .N[any(neg1)], start = min(month), end = max(month)),
by = .(id, runid)][!is.na(length)]
})
# user system elapsed
# 0.345 0.000 0.344
# counts of droughts per id:
res[, .(nDroughts = .N), by = id]
# list of droughts per id: (NB: don't include 1st positive value after)
res[, .(droughtN = seq_len(.N), start, end), by = id]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With