I'm having trouble when trying to calculate the average temperature by hour.
I have a data frame with date, time (hh:mm:ss p.m./a.m.)and temperature. What I need is to extract the mean temperature by hour in order to plot daily variation of temperature.
I'm new to R, but did a try with what I know: I first tried by transforming hours into numbers, then extracting the first two characters, and then to calculate the mean but it didn't work very well. Moreover I have so many files to analize that it would be much better to have something more automated and clean than the "solution" I found.
I believe it must be a better way to calculate averages by hours in R so I've been looking for the answer in other posts here. Unfortunately I couldn't find a clear answer regarding extracting statistics from time data.
My data looks like this
date hour temperature
1 28/12/2013 13:03:01 41.572
2 28/12/2013 13:08:01 46.059
3 28/12/2013 13:13:01 48.55
4 28/12/2013 13:18:01 49.546
5 28/12/2013 13:23:01 49.546
6 28/12/2013 13:28:01 49.546
7 28/12/2013 13:33:01 50.044
8 28/12/2013 13:38:01 50.542
9 28/12/2013 13:43:01 50.542
10 28/12/2013 13:48:01 51.04
11 28/12/2013 13:53:01 51.538
12 28/12/2013 13:58:01 51.538
13 28/12/2013 14:03:01 50.542
14 28/12/2013 14:08:01 51.04
15 28/12/2013 14:13:01 51.04
16 28/12/2013 14:18:01 52.534
17 28/12/2013 14:23:01 53.031
18 28/12/2013 14:28:01 53.031
19 28/12/2013 14:33:01 53.031
20 28/12/2013 14:38:01 51.538
21 28/12/2013 14:43:01 53.031
22 28/12/2013 14:48:01 53.529
etc (24hs data)
And I would like R to calculate average per hour (without taking into account differences in minutes or seconds, just by hour)
Any suggestion? Thank you very much in advance!
Regards, Maria
Combine the date and hour columns into a POSIXct column and cut()
by hourly breaks:
df <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
date hour temperature
28/12/2013 13:03:01 41.572
28/12/2013 13:08:01 46.059
28/12/2013 13:13:01 48.55
28/12/2013 13:18:01 49.546
28/12/2013 13:23:01 49.546
28/12/2013 13:28:01 49.546
28/12/2013 13:33:01 50.044
28/12/2013 13:38:01 50.542
28/12/2013 13:43:01 50.542
28/12/2013 13:48:01 51.04
28/12/2013 13:53:01 51.538
28/12/2013 13:58:01 51.538
28/12/2013 14:03:01 50.542
28/12/2013 14:08:01 51.04
28/12/2013 14:13:01 51.04
28/12/2013 14:18:01 52.534
28/12/2013 14:23:01 53.031
28/12/2013 14:28:01 53.031
28/12/2013 14:33:01 53.031
28/12/2013 14:38:01 51.538
28/12/2013 14:43:01 53.031
28/12/2013 14:48:01 53.529
28/12/2013 15:01:01 50.77")
df$datehour <- cut(as.POSIXct(paste(df$date, df$hour),
format="%d/%m/%Y %H:%M:%S"), breaks="hour")
head(df)
date hour temperature datehour
1 28/12/2013 13:03:01 41.572 2013-12-28 13:00:00
2 28/12/2013 13:08:01 46.059 2013-12-28 13:00:00
3 28/12/2013 13:13:01 48.550 2013-12-28 13:00:00
4 28/12/2013 13:18:01 49.546 2013-12-28 13:00:00
5 28/12/2013 13:23:01 49.546 2013-12-28 13:00:00
6 28/12/2013 13:28:01 49.546 2013-12-28 13:00:00
Now aggregate by that hourly column:
means <- aggregate(temperature ~ datehour, df, mean)
head(means)
datehour temperature
1 2013-12-28 13:00:00 49.17192
2 2013-12-28 14:00:00 52.23470
3 2013-12-28 15:00:00 50.77000
plot(as.POSIXct(means$datehour), means$temperature, type="l", las=1,
main="Hourly Avg Temperatures", xlab="Hour", ylab="")
But, for time series data, I like to use package xts:
require(xts)
df.xts <- xts(df$temperature, as.POSIXct(paste(df$date, df$hour),
format="%d/%m/%Y %H:%M:%S"))
head(df.xts)
[,1]
2013-12-28 13:03:01 41.572
2013-12-28 13:08:01 46.059
2013-12-28 13:13:01 48.550
2013-12-28 13:18:01 49.546
2013-12-28 13:23:01 49.546
2013-12-28 13:28:01 49.546
means <- period.apply(df.xts, endpoints(df.xts, "hours"), mean)
head(means)
[,1]
2013-12-28 13:58:01 49.17192
2013-12-28 14:48:01 52.23470
2013-12-28 15:01:01 50.77000
Notice how the timestamps are the last entry of each hour. We can align the timestamps (down) to the beginning of the hour with this function:
align.time.down = function(x,n){ index(x) = index(x)-n; align.time(x,n) }
means.rounded <- align.time.down(means, 60*60)
# 2nd argument is the number of seconds to adjust/round to,
# just like function align.time()
head(means.rounded)
[,1]
2013-12-28 13:00:00 49.17192
2013-12-28 14:00:00 52.23470
2013-12-28 15:00:00 50.77000
plot(means.rounded, las=1, main="Hourly Avg Temperatures")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With