I have an hourly weather data in the following format:
Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
...
...
12/31/2000 23:00,25
What I need is a daily aggregate of max, min, ave like this:
Date,MaxDBT,MinDBT,AveDBT
01/01/2000,36,23,28
01/02/2000,34,22,29
01/03/2000,32,25,30
...
...
12/31/2000,35,9,20
How to do this in R?
1) This can be done compactly using zoo:
L <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"
library(zoo)
stat <- function(x) c(min = min(x), max = max(x), mean = mean(x))
z <- read.zoo(text = L, header = TRUE, sep = ",", format = "%m/%d/%Y", aggregate = stat)
This gives:
> z
min max mean
2000-01-01 30 33 31.33333
2000-12-31 25 25 25.00000
2) here is a solution that only uses core R:
DF <- read.csv(text = L)
DF$Date <- as.Date(DF$Date, "%m/%d/%Y")
ag <- aggregate(DBT ~ Date, DF, stat) # same stat as in zoo solution
The last line gives:
> ag
Date DBT.min DBT.max DBT.mean
1 2000-01-01 30.00000 33.00000 31.33333
2 2000-12-31 25.00000 25.00000 25.00000
EDIT: (1) Since this first appeared the text=
argument to read.zoo
was added in the zoo package.
(2) minor improvements.
Using strptime()
, trunc()
and ddply()
from the plyr package :
#Make the data
ZZ <- textConnection("Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25")
dataframe <- read.csv(ZZ,header=T)
close(ZZ)
# Do the calculations
dataframe$Date <- strptime(dataframe$Date,format="%m/%d/%Y %H:%M")
dataframe$day <- trunc(dataframe$Date,"day")
require(plyr)
ddply(dataframe,.(day),
summarize,
aveDBT=mean(DBT),
maxDBT=max(DBT),
minDBT=min(DBT)
)
gives
day aveDBT maxDBT minDBT
1 2000-01-01 31.33333 33 30
2 2000-12-31 25.00000 25 25
To clarify :
strptime
converts the character to dates according to the format. To see how you can specify the format, see ?strptime
. trunc
will then truncate these date-times to the specified unit, which is day in this case.
ddply
will evaluate the function summarize
within the dataframe after splitting it up according to day
. everything after summarize
are arguments that are passed to the function summarize
.
There is also a nice package called hydroTSM
. It uses zoo
objects and can convert to other aggregates in time
The function in your case is subdaily2daily
. You can choose if the aggregation should be based on min / max / mean...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With