I want to calculate the average Dist
for each week using these data (below) while preserving the benefits of a using the POSIXct
time class.
df <- structure(list(IndID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), class = "factor", .Label = "AAA"),
Date = structure(c(1329436800, 1329458400, 1329480000, 1329501600,
1329523200, 1329544800, 1329566400, 1329588000, 1329609600,
1329631200, 1329652800, 1329674400, 1329696000, 1329717600,
1329739200, 1329760800, 1329782400, 1329804000, 1329825600,
1329847200, 1329868800, 1329890400, 1329912000, 1329933600,
1329955200, 1329976800, 1329998400, 1330020000, 1330041600,
1330063200, 1330084800, 1330106400, 1330128000, 1330149600,
1330171200, 1330192800, 1330214400, 1330236000, 1330257600,
1330279200, 1330300800, 1330322400, 1330344000, 1330365600,
1330387200, 1330408800, 1330430400, 1330452000, 1330473600,
1330495200), class = c("POSIXct", "POSIXt"), tzone = ""),
Dist = c(3.85567120344727, 52.2649622620809, 1043.61207930222,
1352.58506343616, 176.911523081261, 77.8266318470078, 50.3943567710686,
296.753649985307, 70.5826583995618, 166.394264991861, 251.745346701973,
295.70655057823, 44.6664731663839, 11.1539274078084, 124.578071475754,
757.728373470112, 83.0921234152083, 36.6820839851181, 29.1406161870034,
150.442928003814, 66.0957159105813, 2.23839297570488, 184.88312900824,
513.072526047611, 132.868335201626, 8.09274857805967, 284.479977841835,
479.358187122796, 297.273840894826, 4.00676616275076, 601.492189218489,
249.001525522847, 108.007775719885, 2.38435966274261, 604.365702677913,
1499.59076416313, 111.74722960012, 25.3528529967124, 280.057754683142,
428.157539641219, 70.0365608334965, 71.0886617898624, 265.823654634254,
380.247565078552, 188.857338305481, 9.24402933768915, 120.346786301264,
221.904294953242, 201.086079767386, 81.7857577639103), DoW = c(5,
5, 6, 6, 6, 6, 7, 7, 7, 7, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3,
3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 1,
1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)), .Names = c("IndID", "Date",
"Dist", "DoW"), row.names = c(NA, -50L), class = "data.frame")
> head(df)
IndID Date Dist DoW
1 AAA 2012-02-16 17:00:00 3.855671 5
2 AAA 2012-02-16 23:00:00 52.264962 5
3 AAA 2012-02-17 05:00:00 1043.612079 6
4 AAA 2012-02-17 11:00:00 1352.585063 6
5 AAA 2012-02-17 17:00:00 176.911523 6
6 AAA 2012-02-17 23:00:00 77.826632 6
My thought was to use the plyr
package to average Dist
by week and wanted to first create a new WeekDate
field that contains the date, excluding the time, of the first day of each week. As seen in the DoW (Day of Week) field, data does not always begin on the first day of the week.
While i cannot seem to connect the dots, I want the min Date excluding h:m:s) for each sequential week (DoW 1-7).
Rows 1:10 would be 2012-02-16, Rows 11:38 would be 2012-02-19, Rows 39:50 would be 2012-02-26
I suspect the lubridate
package will be helpful but can not get the code correct.
Any suggestions or alternative methods on the specific creation of a new date column or more broadly averaging Dist for every week would be appreciated.
Using bosom buddy of plyr
,
library(lubridate)
library(dplyr)
df %>%
group_by(Week = floor_date(Date, unit="week")) %>%
summarize(WeeklyAveDist=mean(Dist))
#Source: local data frame [3 x 2]
#
# Week WeeklyAveDist
#1 2012-02-12 381.7755
#2 2012-02-19 252.1116
#3 2012-02-26 175.4097
There are also ceiling_date
, round_date
options.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With