I have a data.frame in R where one column is a list of dates (many of which are duplicates), whereas the other column is a temperature recorded on that date. The columns in question look like this (but is several thousand rows and a few other unnecessary cols):
Date | Temp
-----------------
1/2/13 34.4
1/2/13 36.4
1/2/13 34.3
1/4/13 45.6
1/4/13 33.5
1/5/13 45.2
I need to find a way of getting a daily average for temperature. So ideally, I could tell R to loop through the data.frame and for every date that matched, give me an average for the temperature that day. I've been googling and I know loops in R are possible, but I can't wrap my head around this conceptually given what little I know about R code.
I know I can pull out a single column and average it (i.e. mean(data.frame[[2]])
) but I'm utterly lost on how to tell R to match that mean to a single value located in the first column.
Additionally, how could I generate an average for every seven calendar days (regardless of how many entries exist for a single day)? So, a seven day rolling average, i.e. if my date range starts at 1/1/13 I'd get an average for all temps taken between 1/1/13 and 1/7/13, and then between 1/8/13 and 1/15/13 and so on...
Any assistance helping me grasp R loops is much appreciated. Thank you!
EDIT
Here's the output of dput(head(my.dataframe))
PLEASE NOTE: I edited down both "date" and "timestamp" because they both go on for several thousand entries otherwise:
structure(list(RECID = 579:584, SITEID = c(101L, 101L, 101L,
101L, 101L, 101L), MONTH = c(6L, 6L, 6L, 6L, 6L, 6L), DAY = c(7L,
7L, 7L, 7L, 7L, 7L), DATE = structure(c(34L, 34L, 34L, 34L, 34L,
34L), .Label = c("10/1/2013", "10/10/2013", "10/11/2013", "10/12/2013",
"10/2/2013", "10/3/2013", "10/4/2013", "10/5/2013", "10/6/2013",
"10/7/2013", "10/8/2013", "10/9/2013", "6/10/2013", "6/11/2013","9/9/2013"), class = "factor"), TIMESTAMP = structure(784:789, .Label = c("10/1/2013 0:00",
"10/1/2013 1:00", "10/1/2013 10:00", "10/1/2013 11:00", "10/1/2013 12:00",
"10/1/2013 13:00", "10/1/2013 14:00", "10/1/2013 15:00", "10/1/2013 16:00",
"10/1/2013 17:00", "10/1/2013 18:00", "10/1/2013 19:00", "10/1/2013 2:00"), class = "factor"), TEMP = c(23.376, 23.376, 23.833, 24.146,
24.219, 24.05), X.C = c(NA, NA, NA, NA, NA, NA)), .Names = c("RECID",
"SITEID", "MONTH", "DAY", "DATE", "TIMESTAMP", "TEMP", "X.C"), row.names = c(NA,
6L), class = "data.frame")
You can use the floor_date() function from the lubridate package in R to quickly group data by month.
To calculate the average of a data frame column in R, use the mean() function. The mean() function takes the column name as an argument and calculates the mean value of that column.
To calculate monthly average for time series object, we can use tapply function with mean. For example, if we have a time series object called TimeData then the monthly average for this series can be found by using the command tapply(TimeData,cycle(TimeData),mean).
library(plyr)
ddply(df, .(Date), summarize, daily_mean_Temp = mean(Temp))
This is a simple example of the Split-Apply-Combine paradigm.
Alternative #1 as Ananda Mahto mentions, dplyr
package is a higher-performance rewrite of plyr
. He shows the syntax.
Alternative #2: aggregate()
is also functionally equivalent, just has fewer bells-and-whistles than plyr/dplyr
.
Additionally 'generate average for every 7 calendar days': do you mean 'average-by-week-of-year', or 'moving 7-day average (trailing/leading/centered)'?
Here are a few options:
aggregate(Temp ~ Date, mydf, mean)
# Date Temp
# 1 1/2/13 35.03333
# 2 1/4/13 39.55000
# 3 1/5/13 45.20000
library(dplyr)
mydf %.% group_by(Date) %.% summarise(mean(Temp))
# Source: local data frame [3 x 2]
#
# Date mean(Temp)
# 1 1/2/13 35.03333
# 2 1/4/13 39.55000
# 3 1/5/13 45.20000
library(data.table)
DT <- data.table(mydf)
DT[, mean(Temp), by = Date]
# Date V1
# 1: 1/2/13 35.03333
# 2: 1/4/13 39.55000
# 3: 1/5/13 45.20000
library(xts)
dfX <- xts(mydf$Temp, as.Date(mydf$Date))
apply.daily(dfX, mean)
# [,1]
# 1-02-13 35.03333
# 1-04-13 39.55000
# 1-05-13 45.20000
Since you are dealing with dates, you should explore the xts
package, which will give you access to functions like apply.daily
, apply.weekly
, apply.monthly
and so on which will let you conveniently aggregate your data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With