I have an hourly weather data in the following format:
Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
...
...
12/31/2000 23:00,25
What I need is a daily aggregate of max, min, ave like this:
Date,MaxDBT,MinDBT,AveDBT
01/01/2000,36,23,28
01/02/2000,34,22,29
01/03/2000,32,25,30
...
...
12/31/2000,35,9,20
How to do this in R?
1) This can be done compactly using zoo:
L <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"
library(zoo)
stat <- function(x) c(min = min(x), max = max(x), mean = mean(x))
z <- read.zoo(text = L, header = TRUE, sep = ",", format = "%m/%d/%Y", aggregate = stat)
This gives:
> z
min max mean
2000-01-01 30 33 31.33333
2000-12-31 25 25 25.00000
2) here is a solution that only uses core R:
DF <- read.csv(text = L)
DF$Date <- as.Date(DF$Date, "%m/%d/%Y")
ag <- aggregate(DBT ~ Date, DF, stat) # same stat as in zoo solution
The last line gives:
> ag
Date DBT.min DBT.max DBT.mean
1 2000-01-01 30.00000 33.00000 31.33333
2 2000-12-31 25.00000 25.00000 25.00000
EDIT: (1) Since this first appeared the text= argument to read.zoo was added in the zoo package.
(2) minor improvements.
Using strptime(), trunc() and ddply() from the plyr package :
#Make the data
ZZ <- textConnection("Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25")
dataframe <- read.csv(ZZ,header=T)
close(ZZ)
# Do the calculations
dataframe$Date <- strptime(dataframe$Date,format="%m/%d/%Y %H:%M")
dataframe$day <- trunc(dataframe$Date,"day")
require(plyr)
ddply(dataframe,.(day),
summarize,
aveDBT=mean(DBT),
maxDBT=max(DBT),
minDBT=min(DBT)
)
gives
day aveDBT maxDBT minDBT
1 2000-01-01 31.33333 33 30
2 2000-12-31 25.00000 25 25
To clarify :
strptime converts the character to dates according to the format. To see how you can specify the format, see ?strptime. trunc will then truncate these date-times to the specified unit, which is day in this case.
ddply will evaluate the function summarize within the dataframe after splitting it up according to day. everything after summarize are arguments that are passed to the function summarize.
There is also a nice package called hydroTSM. It uses zoo objects and can convert to other aggregates in time
The function in your case is subdaily2daily. You can choose if the aggregation should be based on min / max / mean...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With