Compute data.frame column averages by date

Tags:

I have a data.frame in R where one column is a list of dates (many of which are duplicates), whereas the other column is a temperature recorded on that date. The columns in question look like this (but is several thousand rows and a few other unnecessary cols):

Date    |    Temp
-----------------
1/2/13     34.4
1/2/13     36.4
1/2/13     34.3
1/4/13     45.6
1/4/13     33.5
1/5/13     45.2

I need to find a way of getting a daily average for temperature. So ideally, I could tell R to loop through the data.frame and for every date that matched, give me an average for the temperature that day. I've been googling and I know loops in R are possible, but I can't wrap my head around this conceptually given what little I know about R code.

I know I can pull out a single column and average it (i.e. mean(data.frame[[2]])) but I'm utterly lost on how to tell R to match that mean to a single value located in the first column.

Additionally, how could I generate an average for every seven calendar days (regardless of how many entries exist for a single day)? So, a seven day rolling average, i.e. if my date range starts at 1/1/13 I'd get an average for all temps taken between 1/1/13 and 1/7/13, and then between 1/8/13 and 1/15/13 and so on...

Any assistance helping me grasp R loops is much appreciated. Thank you!

EDIT

Here's the output of dput(head(my.dataframe)) PLEASE NOTE: I edited down both "date" and "timestamp" because they both go on for several thousand entries otherwise:

structure(list(RECID = 579:584, SITEID = c(101L, 101L, 101L, 
101L, 101L, 101L), MONTH = c(6L, 6L, 6L, 6L, 6L, 6L), DAY = c(7L, 
7L, 7L, 7L, 7L, 7L), DATE = structure(c(34L, 34L, 34L, 34L, 34L, 
34L), .Label = c("10/1/2013", "10/10/2013", "10/11/2013", "10/12/2013", 
"10/2/2013", "10/3/2013", "10/4/2013", "10/5/2013", "10/6/2013", 
"10/7/2013", "10/8/2013", "10/9/2013", "6/10/2013", "6/11/2013","9/9/2013"), class = "factor"), TIMESTAMP = structure(784:789, .Label = c("10/1/2013 0:00", 
"10/1/2013 1:00", "10/1/2013 10:00", "10/1/2013 11:00", "10/1/2013 12:00", 
"10/1/2013 13:00", "10/1/2013 14:00", "10/1/2013 15:00", "10/1/2013 16:00", 
"10/1/2013 17:00", "10/1/2013 18:00", "10/1/2013 19:00", "10/1/2013 2:00"), class = "factor"), TEMP = c(23.376, 23.376, 23.833, 24.146, 
24.219, 24.05), X.C = c(NA, NA, NA, NA, NA, NA)), .Names = c("RECID", 
"SITEID", "MONTH", "DAY", "DATE", "TIMESTAMP", "TEMP", "X.C"), row.names = c(NA, 
6L), class = "data.frame")

220

asked Apr 20 '14 06:04

TheNovice

2 Answers

library(plyr)

ddply(df, .(Date), summarize, daily_mean_Temp = mean(Temp))

This is a simple example of the Split-Apply-Combine paradigm.

Alternative #1 as Ananda Mahto mentions, dplyr package is a higher-performance rewrite of plyr. He shows the syntax.

Alternative #2: aggregate() is also functionally equivalent, just has fewer bells-and-whistles than plyr/dplyr.

Additionally 'generate average for every 7 calendar days': do you mean 'average-by-week-of-year', or 'moving 7-day average (trailing/leading/centered)'?

answered Oct 13 '22 01:10

smci

Here are a few options:

aggregate(Temp ~ Date, mydf, mean)
#     Date     Temp
# 1 1/2/13 35.03333
# 2 1/4/13 39.55000
# 3 1/5/13 45.20000

library(dplyr)
mydf %.% group_by(Date) %.% summarise(mean(Temp))
# Source: local data frame [3 x 2]
# 
#     Date mean(Temp)
# 1 1/2/13   35.03333
# 2 1/4/13   39.55000
# 3 1/5/13   45.20000

library(data.table)
DT <- data.table(mydf)
DT[, mean(Temp), by = Date]
#      Date       V1
# 1: 1/2/13 35.03333
# 2: 1/4/13 39.55000
# 3: 1/5/13 45.20000

library(xts)
dfX <- xts(mydf$Temp, as.Date(mydf$Date))
apply.daily(dfX, mean)
#             [,1]
# 1-02-13 35.03333
# 1-04-13 39.55000
# 1-05-13 45.20000

Since you are dealing with dates, you should explore the xts package, which will give you access to functions like apply.daily, apply.weekly, apply.monthly and so on which will let you conveniently aggregate your data.

138

answered Oct 13 '22 00:10

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                extracting column headers
                            
                                how to load a library dynamically? [duplicate]
                            
                                source(..., chdir=TRUE) does not seem to change the directory
                            
                                Searching a functions source code
                            
                                Generating normal distribution data within range 0 and 1
                            
                                R data.table doing an inner join on a field and operating on another?
                            
                                strsplit into data.frame with incomplete input
                            
                                How to use simultaneously superscript and variable in a axis label with ggplot2
                            
                                How to have annotated text style to inherit from theme_set() options
                            
                                R: using ddply with a function
                            
                                R: Expand a sequence such that the value of any member of the sequence becomes its position and unfilled positions are coded as 0 or NA
                            
                                Chronological timeline with points in time and format date
                            
                                Error when using ComBat
                            
                                How to merge overlapping integer vector elements of a list in R
                            
                                Histogram in R combining first two values
                            
                                How to convert an portion of an XML into a data frame? (properly)
                            
                                Connecting points to regression line in plot
                            
                                Point color (col) and fill color (bg) by group in stripchart
                            
                                How to avoid printing line numbers with data.table?
                            
                                Efficient versions of any/all

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compute data.frame column averages by date

Tags:

loops

for-loop

dataframe

r

average

TheNovice

People also ask

2 Answers

smci

A5C1D2H2I1M1N2O1R2T1

Recent Activity

Donate For Us