I currently have a data frame which contains Name of Station, Date of Rainfall, Amount of Rainfall (Example attached) I am interested in exploring the number of days (and/or months) it takes each Station to reach a particular amount of rainfall. For example:

Is it possible to obtain an output like the one above based on a dataset like the example one? My initial thought process is to filter each station individually, join to a calendar dataframe which extracts the min and max from that range, count the days between them and use a case_when to categorize them. This approach seems a bit convoluted and would appreciate any guidance as to what would be a better approach.
Thanks for the suggestions!
Example Dataset:
Example <- structure(list(Name.Station = c("Station A", "Station A", "Station A",
"Station A", "Station A", "Station B", "Station B", "Station B",
"Station C", "Station C", "Station C", "Station C"), Rainfall.Date = c("7/10/2020",
"8/12/2020", "8/01/2021", "25/06/2021", "26/10/2021", "7/01/2020",
"22/01/2020", "5/02/2020", "5/09/2020", "5/10/2020", "5/11/2020",
"5/12/2020"), Rainfall.Amount = c(210, 210, 208.47, 208.16, 203.67,
227.49, 225, 222.54, 250, 250, 246.18, 245.15)), class = "data.frame", row.names = c(NA,
-12L))
by station you could calculate the cumsum of rainfall greater than the threshold in mm. Then calculate the length of the sequence of days from start date to the date which is maximum in the cumsum.
First of all, though, your dates should be formatted properly.
Example <- transform(Example, Rainfall.Date=as.Date(Rainfall.Date, '%d/%m/%Y'))
do.call(rbind, by(Example, Example$Name.Station, \(x) {
f <- \(mm, x.=x) {
mx <- which.max(cumsum(x.$Rainfall.Amount) > mm)
length(do.call(seq.Date, c(as.list(range(x.$Rainfall.Date[1:mx])), 1)))
}
ds <- seq.int(200, 1e3, 200) ## sequence of 200, 400, ... , 1000mm
r <- t(vapply(ds, f, 0))
data.frame(Name.Station=el(x$Name.Station), `colnames<-`(r, paste0('d_', ds)))
}))
# Name.Station d_200 d_400 d_600 d_800 d_1000
# Station A Station A 1 63 94 262 385
# Station B Station B 1 16 30 1 1
# Station C Station C 1 31 62 92 1
Note: R >= 4.1 used.
Here is a tidyverse approach:
library(dplyr)
library(tidyr)
Example %>%
group_by(Name.Station) %>%
mutate(Rainfall.Date = as.Date(Rainfall.Date, "%d/%m/%Y"),
days = cumsum(c(1, diff(Rainfall.Date))),
crainfall = cumsum(Rainfall.Amount),
fi = (findInterval(crainfall, seq(0, 1000, 200)) -1) * 200) %>%
pivot_wider(id_cols = Name.Station, names_from = fi, values_from = days, names_glue = {"days_to_{fi}_mm"}, values_fn = min)
# A tibble: 3 x 6
# Groups: Name.Station [3]
Name.Station days_to_200_mm days_to_400_mm days_to_600_mm days_to_800_mm days_to_1000_mm
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Station A 1 63 94 262 385
2 Station B 1 16 30 NA NA
3 Station C 1 31 62 92 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With