Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate average of a variable by hour in R

Tags:

time

r

average

I'm having trouble when trying to calculate the average temperature by hour.

I have a data frame with date, time (hh:mm:ss p.m./a.m.)and temperature. What I need is to extract the mean temperature by hour in order to plot daily variation of temperature.

I'm new to R, but did a try with what I know: I first tried by transforming hours into numbers, then extracting the first two characters, and then to calculate the mean but it didn't work very well. Moreover I have so many files to analize that it would be much better to have something more automated and clean than the "solution" I found.

I believe it must be a better way to calculate averages by hours in R so I've been looking for the answer in other posts here. Unfortunately I couldn't find a clear answer regarding extracting statistics from time data.

My data looks like this

          date     hour temperature
1   28/12/2013 13:03:01      41.572
2   28/12/2013 13:08:01      46.059
3   28/12/2013 13:13:01       48.55
4   28/12/2013 13:18:01      49.546
5   28/12/2013 13:23:01      49.546
6   28/12/2013 13:28:01      49.546
7   28/12/2013 13:33:01      50.044
8   28/12/2013 13:38:01      50.542
9   28/12/2013 13:43:01      50.542
10  28/12/2013 13:48:01       51.04
11  28/12/2013 13:53:01      51.538
12  28/12/2013 13:58:01      51.538
13  28/12/2013 14:03:01      50.542
14  28/12/2013 14:08:01       51.04
15  28/12/2013 14:13:01       51.04
16  28/12/2013 14:18:01      52.534
17  28/12/2013 14:23:01      53.031
18  28/12/2013 14:28:01      53.031
19  28/12/2013 14:33:01      53.031
20  28/12/2013 14:38:01      51.538
21  28/12/2013 14:43:01      53.031
22  28/12/2013 14:48:01      53.529
etc (24hs data)

And I would like R to calculate average per hour (without taking into account differences in minutes or seconds, just by hour)

Any suggestion? Thank you very much in advance!

Regards, Maria

like image 878
Maria Avatar asked Jul 09 '14 04:07

Maria


1 Answers

Combine the date and hour columns into a POSIXct column and cut() by hourly breaks:

df <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
date hour temperature
28/12/2013 13:03:01  41.572
28/12/2013 13:08:01  46.059
28/12/2013 13:13:01  48.55
28/12/2013 13:18:01  49.546
28/12/2013 13:23:01  49.546
28/12/2013 13:28:01  49.546
28/12/2013 13:33:01  50.044
28/12/2013 13:38:01  50.542
28/12/2013 13:43:01  50.542
28/12/2013 13:48:01  51.04
28/12/2013 13:53:01  51.538
28/12/2013 13:58:01  51.538
28/12/2013 14:03:01  50.542
28/12/2013 14:08:01  51.04
28/12/2013 14:13:01  51.04
28/12/2013 14:18:01  52.534
28/12/2013 14:23:01  53.031
28/12/2013 14:28:01  53.031
28/12/2013 14:33:01  53.031
28/12/2013 14:38:01  51.538
28/12/2013 14:43:01  53.031
28/12/2013 14:48:01  53.529
28/12/2013 15:01:01  50.77")

df$datehour <- cut(as.POSIXct(paste(df$date, df$hour),
   format="%d/%m/%Y %H:%M:%S"), breaks="hour") 
head(df)
        date     hour temperature            datehour
1 28/12/2013 13:03:01      41.572 2013-12-28 13:00:00
2 28/12/2013 13:08:01      46.059 2013-12-28 13:00:00
3 28/12/2013 13:13:01      48.550 2013-12-28 13:00:00
4 28/12/2013 13:18:01      49.546 2013-12-28 13:00:00
5 28/12/2013 13:23:01      49.546 2013-12-28 13:00:00
6 28/12/2013 13:28:01      49.546 2013-12-28 13:00:00

Now aggregate by that hourly column:

means <- aggregate(temperature ~ datehour, df, mean)
head(means)
             datehour temperature
1 2013-12-28 13:00:00    49.17192
2 2013-12-28 14:00:00    52.23470
3 2013-12-28 15:00:00    50.77000

plot(as.POSIXct(means$datehour), means$temperature, type="l", las=1,
     main="Hourly Avg Temperatures", xlab="Hour", ylab="")

But, for time series data, I like to use package xts:

require(xts)
df.xts <- xts(df$temperature, as.POSIXct(paste(df$date, df$hour),
      format="%d/%m/%Y %H:%M:%S"))
head(df.xts)
                      [,1]
2013-12-28 13:03:01 41.572
2013-12-28 13:08:01 46.059
2013-12-28 13:13:01 48.550
2013-12-28 13:18:01 49.546
2013-12-28 13:23:01 49.546
2013-12-28 13:28:01 49.546

means <- period.apply(df.xts, endpoints(df.xts, "hours"), mean)
head(means)
                        [,1]
2013-12-28 13:58:01 49.17192
2013-12-28 14:48:01 52.23470
2013-12-28 15:01:01 50.77000

Notice how the timestamps are the last entry of each hour. We can align the timestamps (down) to the beginning of the hour with this function:

align.time.down = function(x,n){ index(x) = index(x)-n; align.time(x,n) }
means.rounded <- align.time.down(means, 60*60)  
         # 2nd argument is the number of seconds to adjust/round to,
         # just like function align.time()

head(means.rounded)
                        [,1]
2013-12-28 13:00:00 49.17192
2013-12-28 14:00:00 52.23470
2013-12-28 15:00:00 50.77000

plot(means.rounded, las=1, main="Hourly Avg Temperatures")

enter image description here

like image 197
Mark Rajcok Avatar answered Oct 09 '22 21:10

Mark Rajcok