I have a daily timeseries and I want to get the minumum for every month of every year, but I want to make sure that the results should be 10 days apart at least. To be more specific lets explain on the following sample dataframe.
>Data
Years Months Days Date A B
1 2003 December 1 2003-12-01 10 10
2 2003 December 2 2003-12-02 10 10
3 2003 December 3 2003-12-03 10 10
4 2003 December 4 2003-12-04 10 10
5 2003 December 5 2003-12-05 10 10
6 2003 December 6 2003-12-06 10 10
7 2003 December 7 2003-12-07 10 10
8 2003 December 8 2003-12-08 3 10
9 2003 December 9 2003-12-09 10 10
10 2003 December 10 2003-12-10 10 10
11 2003 December 11 2003-12-11 10 10
12 2003 December 12 2003-12-12 10 4
13 2003 December 13 2003-12-13 10 10
14 2003 December 14 2003-12-14 10 10
15 2003 December 15 2003-12-15 10 10
16 2003 December 16 2003-12-16 10 10
17 2003 December 17 2003-12-17 10 10
18 2003 December 18 2003-12-18 10 10
19 2003 December 19 2003-12-19 10 10
20 2003 December 20 2003-12-20 10 10
21 2003 December 21 2003-12-21 10 10
22 2003 December 22 2003-12-22 10 10
23 2003 December 23 2003-12-23 10 10
24 2003 December 24 2003-12-24 10 10
25 2003 December 25 2003-12-25 10 10
26 2003 December 26 2003-12-26 10 10
27 2003 December 27 2003-12-27 10 10
28 2003 December 28 2003-12-28 10 10
29 2003 December 29 2003-12-29 10 10
30 2003 December 30 2003-12-30 10 10
31 2003 December 31 2003-12-31 10 10
32 2004 January 1 2004-01-01 10 10
33 2004 January 2 2004-01-02 10 10
34 2004 January 3 2004-01-03 10 10
35 2004 January 4 2004-01-04 10 10
36 2004 January 5 2004-01-05 10 10
37 2004 January 6 2004-01-06 10 10
38 2004 January 7 2004-01-07 10 10
39 2004 January 8 2004-01-08 10 10
40 2004 January 9 2004-01-09 10 10
41 2004 January 10 2004-01-10 10 10
42 2004 January 11 2004-01-11 10 10
43 2004 January 12 2004-01-12 10 10
44 2004 January 13 2004-01-13 10 10
45 2004 January 14 2004-01-14 10 10
46 2004 January 15 2004-01-15 10 10
47 2004 January 16 2004-01-16 10 10
48 2004 January 17 2004-01-17 10 10
49 2004 January 18 2004-01-18 10 10
50 2004 January 19 2004-01-19 10 10
51 2004 January 20 2004-01-20 10 10
52 2004 January 21 2004-01-21 10 10
53 2004 January 22 2004-01-22 10 10
54 2004 January 23 2004-01-23 10 10
55 2004 January 24 2004-01-24 10 10
56 2004 January 25 2004-01-25 5 4
57 2004 January 26 2004-01-26 10 10
58 2004 January 27 2004-01-27 10 10
59 2004 January 28 2004-01-28 10 10
60 2004 January 29 2004-01-29 10 10
61 2004 January 30 2004-01-30 10 10
62 2004 January 31 2004-01-31 10 10
63 2004 February 1 2004-02-01 10 10
64 2004 February 2 2004-02-02 5 4
65 2004 February 3 2004-02-03 10 10
66 2004 February 4 2004-02-04 10 10
67 2004 February 5 2004-02-05 10 10
68 2004 February 6 2004-02-06 10 10
69 2004 February 7 2004-02-07 10 10
70 2004 February 8 2004-02-08 10 10
71 2004 February 9 2004-02-09 7 6
72 2004 February 10 2004-02-10 10 10
73 2004 February 11 2004-02-11 10 10
74 2004 February 12 2004-02-12 10 10
75 2004 February 13 2004-02-13 10 10
76 2004 February 14 2004-02-14 10 10
77 2004 February 15 2004-02-15 10 10
78 2004 February 16 2004-02-16 10 10
79 2004 February 17 2004-02-17 10 10
80 2004 February 18 2004-02-18 10 10
81 2004 February 19 2004-02-19 10 10
82 2004 February 20 2004-02-20 10 10
83 2004 February 21 2004-02-21 10 10
84 2004 February 22 2004-02-22 10 10
85 2004 February 23 2004-02-23 10 10
86 2004 February 24 2004-02-24 10 10
87 2004 February 25 2004-02-25 10 10
88 2004 February 26 2004-02-26 10 10
89 2004 February 27 2004-02-27 10 10
90 2004 February 28 2004-02-28 10 10
91 2004 February 29 2004-02-29 10 10
I want to do almost what aggregate() does
min <- aggregate(Data[5:6], by= list(Data$Months, Data$Years), FUN = min)
Group.1 Group.2 A B
December 2003 3 4
January 2004 5 4
February 2004 5 4
BUT instead to get for Feb the minimum value for each A and B that are at least 10 days apart from the previous months' min value.
So I would like to get:
Group.1 Group.2 A B
December 2003 3 4
January 2004 5 4
February 2004 7 6
Any ideas?
This solution is only a dozen lines. We first split the input data frame into a list of data frames ym
each of which represents a year/month. Then we sapply over the columns we wish to calculate minimums for. For each column, we iterate over the ym
components such that for each component, i.e. for each data.frame, we subset it to s
, a data frame of rows that are at least 10 days after the prior minDate
, calculate row of the minimum, ix
, update minDate
and return the result
:
ym <- split(DF, format(DF$Date, "%Y-%m"))
sapply(c("A", "B"), function(col) {
minDate <- min(DF$Date) - 10
result <- vector(length = length(ym))
for(i in seq_along(ym)) {
s <- subset(ym[[i]], Date >= minDate + 10)
ix <- which.min(s[[col]])
minDate <- s$Date[ix]
result[i] <- min(s[[col]][ix])
}
setNames(result, names(ym))
})
This gives:
A B
2003-12 3 4
2004-01 5 4
2004-02 7 6
(We only use the "Date"
, "A
" and "B"
columns of DF
so we could have reduced DF
to that first.)
Note: We assumed this data frame as input:
DF <-
structure(list(Years = c(2003L, 2003L, 2003L, 2003L, 2003L, 2003L,
2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L,
2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L,
2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2004L, 2004L,
2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L,
2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L,
2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L,
2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L,
2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L,
2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L,
2004L, 2004L, 2004L, 2004L), Months = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("December", "February",
"January"), class = "factor"), Days = c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L,
25L, 26L, 27L, 28L, 29L), Date = structure(c(12387, 12388, 12389,
12390, 12391, 12392, 12393, 12394, 12395, 12396, 12397, 12398,
12399, 12400, 12401, 12402, 12403, 12404, 12405, 12406, 12407,
12408, 12409, 12410, 12411, 12412, 12413, 12414, 12415, 12416,
12417, 12418, 12419, 12420, 12421, 12422, 12423, 12424, 12425,
12426, 12427, 12428, 12429, 12430, 12431, 12432, 12433, 12434,
12435, 12436, 12437, 12438, 12439, 12440, 12441, 12442, 12443,
12444, 12445, 12446, 12447, 12448, 12449, 12450, 12451, 12452,
12453, 12454, 12455, 12456, 12457, 12458, 12459, 12460, 12461,
12462, 12463, 12464, 12465, 12466, 12467, 12468, 12469, 12470,
12471, 12472, 12473, 12474, 12475, 12476, 12477), class = "Date"),
A = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 3L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 5L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 5L, 10L, 10L, 10L, 10L, 10L, 10L, 7L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), B = c(10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 4L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 4L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 4L, 10L, 10L, 10L, 10L, 10L, 10L, 6L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L)), .Names = c("Years", "Months",
"Days", "Date", "A", "B"), row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37",
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48",
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59",
"60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70",
"71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81",
"82", "83", "84", "85", "86", "87", "88", "89", "90", "91"), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With