Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using data.table to summarize monthly sequences (count specific events)

I hope this is an acceptable R/data.table problem.

I have a 3-column table with:

  • id geographic location IDs (303,453 locations)
  • month month over 25 years 1990-2014
  • spei a climatic index that varies between -7 and 7.

I need to count the occurrence of droughts at each location over the entire 1990-2014 period. A drought event is defined as "a period in which the SPEI is continuously negative and the SPEI reaches a value of -1.0 or less. Drought starts when the SPEI first falls below zero and ends with the first positive SPEI value following a value of -1.0 or less".

I know this should be feasible using shift() and rolling joins but would very welcome some help!

# Sample table structure
dt <- data.table(
  id = rep(1:303453, each=25*12),
  month = rep(seq(as.Date("1990-01-01"), as.Date("2014-12-31"), "month"), 303453),
  spei = runif(303453*25*12, -7, 7))

# A minimal example with 1 location over 12 months
library(data.table)
library(xts)

dt <- data.table(
  id = rep("loc1", each=12),
  month = seq(as.Date("2014-01-01"), as.Date("2014-12-31"), "month"),
  spei = c(-2, -1.1, -0.5, 1.2, -1.2, 2.3, -1.7, -2.1, 0.9, 1.2, -0.9, -0.2))

spei.ts <- xts(dt$spei, order.by=dt$month, frequency="month")
plot(spei.ts, type="bars")

enter image description here

This shows 3 drought events over a 1-year period. This is what I need to identify and count.

Hoping some of you are more used to working with time series. Many thanks, --Mel.

like image 464
mbacou Avatar asked Apr 26 '26 20:04

mbacou


1 Answers

Here is a starting point to get the result you want. Probably experts can suggest improvements in speed.

EDIT: improved speed ~8x by removing paste.

library(data.table)
set.seed(42)
n <- 300  # 303453 will be ~1000 times slower
dt <- data.table(
    id = rep(1:n, each=25*12),
    month = rep(seq(as.Date("1990-01-01"), as.Date("2014-12-31"), "month"), n),
    spei = runif(n*25*12, -7, 7))

system.time({
  dt[, `:=`(neg = (spei < 0), neg1 = (spei <= -1))]
  dt[, runid := ifelse(neg, rleid(neg), NA)]
  res <- dt[!is.na(runid), 
            .(length = .N[any(neg1)], start = min(month), end = max(month)), 
            by = .(id, runid)][!is.na(length)]

})
#    user  system elapsed 
#   0.345   0.000   0.344 

# counts of droughts per id:
res[, .(nDroughts = .N), by = id]

# list of droughts per id: (NB: don't include 1st positive value after) 
res[, .(droughtN = seq_len(.N), start, end), by = id]
like image 156
Max Avatar answered Apr 28 '26 10:04

Max



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!