I've been working with R just a few months, I have a problem with a zoo series with data at each five minutes. The are no missing time points in the series, but there are some NaN values on data.
>str(SerieCompleta)
‘zoo’ series from 2011-01-01 to 2011-12-31 23:55:00
Data: num [1:104737, 1] 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "na.action")=Class 'omit' num [1:383] 2017 3745 5761 6786 6787 ...
Index: POSIXct[1:104737], format: "2011-01-01 00:00:00" "2011-01-01 00:05:00" ...
I need to find the maximum of groups of data, and groups of data should be separated by thirty or more consecutive minutes with zero values.
2011-01-02 05:15:00 0
2011-01-02 05:20:00 0
2011-01-02 05:25:00 0
2011-01-02 05:30:00 0
2011-01-02 05:35:00 0.1 |
2011-01-02 05:40:00 0.2 <--- maximum of group
2011-01-02 05:45:00 0.2 |
2011-01-02 05:50:00 0.1 |
2011-01-02 05:55:00 0.1 |
2011-01-02 06:00:00 0.1 |
2011-01-02 06:05:00 0.1 |
2011-01-02 06:10:00 0 |
2011-01-02 06:15:00 0 |
2011-01-02 06:20:00 0.1 |
2011-01-02 06:25:00 0
2011-01-02 06:30:00 0
2011-01-02 06:35:00 0
2011-01-02 06:40:00 0 thirty or more consecutive minutes with zero values on data
2011-01-02 06:45:00 0
2011-01-02 06:50:00 0
2011-01-02 06:55:00 0
2011-01-02 07:00:00 0.2 |
2011-01-02 07:05:00 2.5 <--- maximum of group
2011-01-02 07:10:00 0
Output should look like:
2011-01-02 05:40:00 0.2
2011-01-02 07:05:00 2.5
I don't know if there's a way to do this using an R feature. Thanks in advance for any suggestion.
I'll call your data column x (x includes only the numeric data, not the date and times). I'll further assume that you have no missing time points and that all your time points are 5 minutes apart. Here is a function that will return a two-column matrix, where each row contains the start and end indices of your groups (it ignores zeroes in the beginning and end):
blocks <- function(x) {
z <- rle(x==0)
breaks <- which(z$lengths >= 6 & z$values == TRUE)
breaks <- breaks[!breaks %in% c(1, length(z$lengths))]
break.idx <- cumsum(z$lengths)
cbind(c(1, break.idx[breaks] + 1), c(break.idx[breaks-1], length(x)))
}
For your data, you will get
> x
[1] 0.0 0.0 0.0 0.0 0.1 0.2 0.2 0.1 0.1 0.1 0.1 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0
[20] 0.0 0.0 0.1 2.5 0.0
> blocks(x)
[,1] [,2]
[1,] 1 14
[2,] 22 24
Now simply apply the which.max function on your groups to get indices with the maximum values:
> apply(blocks(x), 1, function(i) {which.max(x[i[1]:i[2]]) + i[1] - 1})
[1] 6 23
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With