Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate function in R zoo returns error

I have been using this line of code daily for a very long time and something broke after returning to standard time. I am trying to use aggregate.zoo to rank hourly data by day. The period of data i am looking at does not include DST time, so I don't understand what the issue is. I am using the zoo package. Here is the structure of the data:

require(zoo)

structure(c(15.52, 14.56, 14.31, 14.17, 13.75, 15.3, 25.57, 25.39, 
23.43, 22.92, 23.31, 23.44, 22.09, 21.28, 21, 20.94, 27.16, 32.73, 
33.74, 29.12, 24.78, 21.44, 18.95, 17.08, 17.9, 17.54, 16.45, 
16.59, 16.09, 17.23, 25.31, 25.43, 24.93, 24.47, 23.69, 21.53, 
19.61, 19.53, 19.38, 19.38, 24.59, 29.03, 30.02, 23.78, 20.44, 
21.39, 18.79, 21.7, 20.5, 20.63, 18.57, 19.41, 19.2, 15.23, 23.48, 
24.89, 24.79, 24.01, 23.18, 22.5, 20.88, 21.12, 20, 20.55, 27.83, 
31.21, 28.29, 26.1, 23.31, 21.64, 18.19, 17.28, 17.87, 17.63, 
16.48, 17.27, 17.04, 17.79, 19.78, 20.92, 23.53, 25.45, 24.37, 
21.89, 21.22, 21.03, 19.92, 20.14, 24.9, 27.88, 28.54, 29.14, 
24.93, 19.24, 21.75, 18.66, 19.21, 18.49, 17.08, 15.92, 15.51, 
15.5, 16.1, 16.61, 18.54, 20.15, 20.36, 20.32, 19.94, 19.13, 
18.58, 18.71, 21.1, 26.77, 28.98, 27.12, 24.71, 20.83, 18.2, 
16.68, 17.56, 16.86, 15.79, 15.47, 15.49, 17.66, 24.05, 24.97, 
25.1, 25.74, 25.32, 24.98, 24.13, 24.05, 23.45, 23.84, 26.16, 
31.2, 32.57, 30.19, 26.1, 22, 18.9, 17.85, 17.7, 16.53, 15.32, 
14.77, 15.06, 17.54, 24.94, 26.06, 25.52, 25.85, 25.53, 24.97, 
24.17, 24.19, 23.55, 23.78, 26.35, 31.49, 33.06, 29.9, 25.52, 
21.62, 18.77, 17.77, 17.87, 16.75, 15.38, 14.69, 15, 17.84, 25.05, 
25.5, 24.92, 25.46, 25.03, 24.81, 24.22, 23.54, 22.95, 22.9, 
25.82, 30.87, 32.49, 29.17, 25.13, 21.56, 18.58, 17.67, 17.06, 
15.96, 14.45, 13.93, 14.27, 16.64, 23.29, 24.17, 24.07, 24.22, 
24.06, 24.06, 23.24, 23.05, 22.39, 22.54, 25.07, 29.89, 31.53, 
28.24, 24.28, 20.78, 18.09, 17.02), index = structure(c(1478671200, 
1478674800, 1478678400, 1478682000, 1478685600, 1478689200, 1478692800, 
1478696400, 1478700000, 1478703600, 1478707200, 1478710800, 1478714400, 
1478718000, 1478721600, 1478725200, 1478728800, 1478732400, 1478736000, 
1478739600, 1478743200, 1478746800, 1478750400, 1478754000, 1478757600, 
1478761200, 1478764800, 1478768400, 1478772000, 1478775600, 1478779200, 
1478782800, 1478786400, 1478790000, 1478793600, 1478797200, 1478800800, 
1478804400, 1478808000, 1478811600, 1478815200, 1478818800, 1478822400, 
1478826000, 1478829600, 1478833200, 1478836800, 1478840400, 1478844000, 
1478847600, 1478851200, 1478854800, 1478858400, 1478862000, 1478865600, 
1478869200, 1478872800, 1478876400, 1478880000, 1478883600, 1478887200, 
1478890800, 1478894400, 1478898000, 1478901600, 1478905200, 1478908800, 
1478912400, 1478916000, 1478919600, 1478923200, 1478926800, 1478930400, 
1478934000, 1478937600, 1478941200, 1478944800, 1478948400, 1478952000, 
1478955600, 1478959200, 1478962800, 1478966400, 1478970000, 1478973600, 
1478977200, 1478980800, 1478984400, 1478988000, 1478991600, 1478995200, 
1478998800, 1479002400, 1479006000, 1479009600, 1479013200, 1479016800, 
1479020400, 1479024000, 1479027600, 1479031200, 1479034800, 1479038400, 
1479042000, 1479045600, 1479049200, 1479052800, 1479056400, 1479060000, 
1479063600, 1479067200, 1479070800, 1479074400, 1479078000, 1479081600, 
1479085200, 1479088800, 1479092400, 1479096000, 1479099600, 1479103200, 
1479106800, 1479110400, 1479114000, 1479117600, 1479121200, 1479124800, 
1479128400, 1479132000, 1479135600, 1479139200, 1479142800, 1479146400, 
1479150000, 1479153600, 1479157200, 1479160800, 1479164400, 1479168000, 
1479171600, 1479175200, 1479178800, 1479182400, 1479186000, 1479189600, 
1479193200, 1479196800, 1479200400, 1479204000, 1479207600, 1479211200, 
1479214800, 1479218400, 1479222000, 1479225600, 1479229200, 1479232800, 
1479236400, 1479240000, 1479243600, 1479247200, 1479250800, 1479254400, 
1479258000, 1479261600, 1479265200, 1479268800, 1479272400, 1479276000, 
1479279600, 1479283200, 1479286800, 1479290400, 1479294000, 1479297600, 
1479301200, 1479304800, 1479308400, 1479312000, 1479315600, 1479319200, 
1479322800, 1479326400, 1479330000, 1479333600, 1479337200, 1479340800, 
1479344400, 1479348000, 1479351600, 1479355200, 1479358800, 1479362400, 
1479366000, 1479369600, 1479373200, 1479376800, 1479380400, 1479384000, 
1479387600, 1479391200, 1479394800, 1479398400, 1479402000, 1479405600, 
1479409200, 1479412800, 1479416400, 1479420000, 1479423600, 1479427200, 
1479430800, 1479434400, 1479438000, 1479441600, 1479445200), class = c("POSIXct", 
"POSIXt"), tzone = "America/New_York"), class = "zoo")

All I'm trying to do is rank the data by day, so the line I have been using successfully for a long time is:

ordered.price = aggregate(z, as.Date(index(z), tz='EST'), 
                          FUN=rank, ties.method = "first", na.last = FALSE)

When running this, I get this error:

Error in zoo(df, ix[!is.na(ix)]) : 
 “x” : attempt to define invalid zoo object

I am at lost here, especially since the code was functioning properly for so long. I'm using R 3.2.3. Any help is really appreciated.

like image 665
Mathieu Avatar asked Nov 11 '16 20:11

Mathieu


1 Answers

Below we give some approaches that avoid the error. (1) is not likely what you want but it will give us some insight before we get to (2) and (3).

1) padding There are different numbers of elements returned by FUN for different dates so it can't create a rectangular result.

lens <- aggregate(z,as.Date(index(z),tz='EST'), FUN = length)

giving:

> lens
2016-11-09 2016-11-10 2016-11-11 2016-11-12 2016-11-13 2016-11-14 2016-11-15 
        23         24         24         24         24         24         24 
2016-11-16 2016-11-17 2016-11-18 
        24         24          1 

Padding to a length of 24 will allow it to not give an error:

aggregate(z,as.Date(index(z),tz='EST'), FUN = 
  function(x) replace(rep(NA, max(lens)), seq_along(x), rank(x, ties="first", na.last=F)))

giving:

2016-11-09  6  4  3  2  1  5 19 18 15 13 14 16 12 10  9  8 20 22 23 21 17 11  7 NA
2016-11-10  4  7  6  2  3  1  5 21 22 20 18 16 15 12 11  9 10 19 23 24 17 13 14  8
2016-11-11 13  7  9  3  5  4  1 17 20 19 18 15 14 10 11  6  8 22 24 23 21 16 12  2
2016-11-12  4  7  5  1  3  2  6  9 12 17 21 18 16 14 13 10 11 19 22 23 24 20  8 15
2016-11-13 11 14  8  6  3  2  1  4  5  9 16 18 17 15 13 10 12 20 22 24 23 21 19  7
2016-11-14  4  6  5  3  1  2  7 12 15 17 19 18 16 14 13 10 11 21 23 24 22 20  9  8
2016-11-15  7  6  4  3  1  2  5 14 20 16 19 18 15 12 13 10 11 21 23 24 22 17  9  8
2016-11-16  5  7  4  3  1  2  6 17 20 15 19 16 14 13 12 11 10 21 23 24 22 18  9  8
2016-11-17  7  6  4  3  1  2  5 14 18 17 19 15 16 13 12 10 11 21 23 24 22 20  9  8
2016-11-18  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Although padding avoids the error it is not likely what you really want.

2) time zone It is likely that the alignment with dates is off which is why it is producing 23 elements for the first date and 1 element for the last. If you fix that up so that there are 24 elements per date then it will work without padding.

Using this value for tz seems to give 24 elements per date. (Running OlsonNames() gives a list of time zones.)

table(as.Date(time(z), tz = "Etc/GMT+6"))

giving:

2016-11-09 2016-11-10 2016-11-11 2016-11-12 2016-11-13 2016-11-14 2016-11-15 
        24         24         24         24         24         24         24 
2016-11-16 2016-11-17 
        24         24 

Thus we can replace tz="EST" used in the question with tz="Etc/GMT=6" giving this code:

aggregate(z, as.Date(index(z), tz='Etc/GMT+6'), 
     FUN=rank, ties.method = "first", na.last = FALSE)

which results in:

2016-11-09  6  4 3 2 1 5 20 19 16 14 15 17 13 11 10  9 21 23 24 22 18 12  8  7
2016-11-10  6  5 2 3 1 4 21 22 20 18 16 14 11 10  8  9 19 23 24 17 12 13  7 15
2016-11-11  8 10 4 6 5 1 17 20 19 18 15 14 11 12  7  9 22 24 23 21 16 13  3  2
2016-11-12  6  4 1 3 2 5  9 12 17 21 18 16 14 13 10 11 19 22 23 24 20  8 15  7
2016-11-13 14  9 7 3 2 1  4  5 10 16 18 17 15 13 11 12 20 22 24 23 21 19  8  6
2016-11-14  5  4 3 1 2 6 12 15 17 19 18 16 14 13 10 11 21 23 24 22 20  9  8  7
2016-11-15  6  4 3 1 2 5 14 20 16 19 18 15 12 13 10 11 21 23 24 22 17  9  8  7
2016-11-16  7  4 3 1 2 6 17 20 15 19 16 14 13 12 11 10 21 23 24 22 18  9  8  5
2016-11-17  7  4 3 1 2 5 14 18 17 19 15 16 13 12 10 11 21 23 24 22 20  9  8  6

Although this gives the correct answer fiddling with the time zone like this may not really be the best thing to do.

3) Noon Another way to approach this if we know that (i) we are only concerned with date/times in the New York and GMT time zones and (ii) the data begins at 1am and has consecutive hours is to use the datetimes at position 12, 12+24, 12+48, etc. since noon will be the same day in either of the two aforementioned time zones.

aggregate(z, as.Date(index(z)[rep(seq(12, length(z), 24), each = 24)]),
     FUN=rank, ties.method = "first", na.last = FALSE)

4) Subtract 1 second The poster commented that the problem is that his hours represent the end of a one hour event so the the first 24 datetimes in z are

> head(time(z), 24)
c("2016-11-09 01:00:00 EST", "2016-11-09 02:00:00 EST", ..., 
    "2016-11-09 23:00:00 EST", "2016-11-10 00:00:00 EST")

which have hours of 01, 02, ..., 23, 00 and so the last hour is on the next day. If we subtract one second from each we get 00, 01, 02, ..., 22, 23 so the hours are now all on the same day:

> head(time(z)-1, 24)
c("2016-11-09 00:59:59 EST", "2016-11-09 01:59:59 EST", ..., 
    "2016-11-09 22:59:59 EST", "2016-11-09 23:59:59 EST")

Thus we can use:

aggregate(z, as.Date(index(z)-1, tz = "EST"),
     FUN=rank, ties.method = "first", na.last = FALSE)

By subtracting one second the times still reasonably refer to the end of each time period and we get the correct dates. We could alternately subtract one hour instead of one second, as proposed by the poster in his comment below, in which case the times would represent the beginning of the hour instead of the end, as he notes.

Note: Also consider whether or not the time zone "EST5EDT" is more appropriate to your problem.

like image 178
G. Grothendieck Avatar answered Sep 16 '22 12:09

G. Grothendieck