I'm stucked trying to create a code that filters 5 consecutive days from a vector of dates starting from a given start date, across multiple years.
Example:
v = seq.Date(as.Date("2000-01-01"), as.Date("2020-12-31"), 1)
initial_date = as.Date("01-01") #(mm:dd)
Expected output:
2000-01-01 #(YYYY:mm:dd)
2000-01-02
2000-01-03
2000-01-04
2000-01-05
2001-01-01
2001-01-02
2001-01-03
2001-01-04
2001-01-05
...
2020-01-01
2020-01-02
2020-01-03
2020-01-04
2020-01-05
If the initial date is, for example, 12-31 (mm:dd), the result should be:
2000-12-31
2001-01-01
2001-01-02
2001-01-03
2001-01-04
2001-31-12
2002-01-01
2002-01-02
2002-01-03
2002-01-04
...
2019-12-31
2020-01-01
2020-01-02
2020-01-03
2020-01-04
2020-12-31
Any tips?
You could use str_detect()
to find the starting value and use the index to find the next 4 rows.
library(dplyr)
inital_date = "12-31"
tibble(dates = v)|>
mutate(index = stringr::str_detect(v, "12-31")) |>
slice(sort(which(index) + rep(0:4, each = sum(index))))
select(-index)
Output:
# A tibble: 97 × 1
dates
<date>
1 2000-12-31
2 2001-01-01
3 2001-01-02
4 2001-01-03
5 2001-01-04
6 2001-12-31
7 2002-01-01
8 2002-01-02
9 2002-01-03
10 2002-01-04
n <- 5
v <- seq.Date(as.Date("2000-01-01"), as.Date("2020-12-31"), 1)
initial_date <- "12-31"
tmp <- rowSums(
sapply(seq_len(n) - 1, function(z)
(v-z) %in% v &
format(v-z, format = "%m-%d") == initial_date)
) > 0
v[tmp]
# [1] "2000-12-31" "2001-01-01" "2001-01-02" "2001-01-03" "2001-01-04" "2001-12-31" "2002-01-01" "2002-01-02" "2002-01-03"
# [10] "2002-01-04" "2002-12-31" "2003-01-01" "2003-01-02" "2003-01-03" "2003-01-04" "2003-12-31" "2004-01-01" "2004-01-02"
# [19] "2004-01-03" "2004-01-04" "2004-12-31" "2005-01-01" "2005-01-02" "2005-01-03" "2005-01-04" "2005-12-31" "2006-01-01"
# [28] "2006-01-02" "2006-01-03" "2006-01-04" "2006-12-31" "2007-01-01" "2007-01-02" "2007-01-03" "2007-01-04" "2007-12-31"
# [37] "2008-01-01" "2008-01-02" "2008-01-03" "2008-01-04" "2008-12-31" "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04"
# [46] "2009-12-31" "2010-01-01" "2010-01-02" "2010-01-03" "2010-01-04" "2010-12-31" "2011-01-01" "2011-01-02" "2011-01-03"
# [55] "2011-01-04" "2011-12-31" "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" "2012-12-31" "2013-01-01" "2013-01-02"
# [64] "2013-01-03" "2013-01-04" "2013-12-31" "2014-01-01" "2014-01-02" "2014-01-03" "2014-01-04" "2014-12-31" "2015-01-01"
# [73] "2015-01-02" "2015-01-03" "2015-01-04" "2015-12-31" "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04" "2016-12-31"
# [82] "2017-01-01" "2017-01-02" "2017-01-03" "2017-01-04" "2017-12-31" "2018-01-01" "2018-01-02" "2018-01-03" "2018-01-04"
# [91] "2018-12-31" "2019-01-01" "2019-01-02" "2019-01-03" "2019-01-04" "2019-12-31" "2020-01-01" "2020-01-02" "2020-01-03"
# [100] "2020-01-04" "2020-12-31"
(v-z) %in% v
is necessary, since otherwise it'll "match" 1999-12-31
which is not in the original v
format(v-z, format = "%m-%d")
checks to see if z
days ago (for all in v
) matches the initial_date
sapply
is a length(v)
-row, 5-column matrix of logicals, where each row indicates if that row'th v
should be retained (is within 5 days of initial_date
). To determine if any on a row are true, we use rowSums(.) > 0
, which will return a logical
vector the same length as v
.Note: I tried doing this all at once using outer(v, 0:4, `-`)
, but matrices in R don't tend to keep the "Date"
class, and working around that (though it can be done) is a bit onerous. The solution is to produce a matrix that isn't classes as a date, ergo doing the comparison within sapply
. I tried to reduce the number of "loops" as much as possible, ergo iterating over 0:4
instead of v
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With