How to subset data.frame by weeks and then sum?

Tags:

Let's say I have several years worth of data which look like the following

# load date package and set random seed
library(lubridate)
set.seed(42)

# create data.frame of dates and income
date <- seq(dmy("26-12-2010"), dmy("15-01-2011"), by = "days")
df <- data.frame(date = date, 
                 wday = wday(date),
                 wday.name = wday(date, label = TRUE, abbr = TRUE),
                 income = round(runif(21, 0, 100)),
                 week = format(date, format="%Y-%U"),
                 stringsAsFactors = FALSE)

#          date wday wday.name income    week
# 1  2010-12-26    1       Sun     91 2010-52
# 2  2010-12-27    2       Mon     94 2010-52
# 3  2010-12-28    3      Tues     29 2010-52
# 4  2010-12-29    4       Wed     83 2010-52
# 5  2010-12-30    5     Thurs     64 2010-52
# 6  2010-12-31    6       Fri     52 2010-52
# 7  2011-01-01    7       Sat     74 2011-00
# 8  2011-01-02    1       Sun     13 2011-01
# 9  2011-01-03    2       Mon     66 2011-01
# 10 2011-01-04    3      Tues     71 2011-01
# 11 2011-01-05    4       Wed     46 2011-01
# 12 2011-01-06    5     Thurs     72 2011-01
# 13 2011-01-07    6       Fri     93 2011-01
# 14 2011-01-08    7       Sat     26 2011-01
# 15 2011-01-09    1       Sun     46 2011-02
# 16 2011-01-10    2       Mon     94 2011-02
# 17 2011-01-11    3      Tues     98 2011-02
# 18 2011-01-12    4       Wed     12 2011-02
# 19 2011-01-13    5     Thurs     47 2011-02
# 20 2011-01-14    6       Fri     56 2011-02
# 21 2011-01-15    7       Sat     90 2011-02

I would like to sum 'income' for each week (Sunday thru Saturday). Currently I do the following:

Weekending 2011-01-01 = sum(df$income[1:7]) = 487
Weekending 2011-01-08 = sum(df$income[8:14]) = 387
Weekending 2011-01-15 = sum(df$income[15:21]) = 443

However I would like a more robust approach which will automatically sum by week. I can't work out how to automatically subset the data into weeks. Any help would be much appreciated.

497

asked Jul 09 '12 13:07

Tony Breyal

2 Answers

First use format to convert your dates to week numbers, then plyr::ddply() to calculate the summaries:

library(plyr)
df$week <- format(df$date, format="%Y-%U")
ddply(df, .(week), summarize, income=sum(income))
     week income
1 2011-52    413
2 2012-01    435
3 2012-02    379

For more information on format.date, see ?strptime, particular the bit that defines %U as the week number.

EDIT:

Given the modified data and requirement, one way is to divide the date by 7 to get a numeric number indicating the week. (Or more precisely, divide by the number of seconds in a week to get the number of weeks since the epoch, which is 1970-01-01 by default.

In code:

df$week <- as.Date("1970-01-01")+7*trunc(as.numeric(df$date)/(3600*24*7))
library(plyr)
ddply(df, .(week), summarize, income=sum(income))

        week income
1 2010-12-23    298
2 2010-12-30    392
3 2011-01-06    294
4 2011-01-13    152

I have not checked that the week boundaries are on Sunday. You will have to check this, and insert an appropriate offset into the formula.

152

answered Oct 28 '22 12:10

Andrie

This is now simple using dplyr. Also I would suggest using cut(breaks = "week") rather than format() to cut the dates into weeks.

library(dplyr)
df %>% group_by(week = cut(date, "week")) %>% mutate(weekly_income = sum(income))

answered Oct 28 '22 14:10

Jim

Related questions
                            
                                Create sf object from two-column matrix
                            
                                Download attachment from an outlook email using R
                            
                                How do I provide only the year in a citation in R markdown?
                            
                                Grouping and counting to get a closerate
                            
                                How to run a function multiple times and write the results to a list?
                            
                                Getting R printed texts to have color esp. in R markdown knits?
                            
                                How to loop over a tidy eval function using purrr?
                            
                                How to add metadata to a tibble
                            
                                How to find out which package was installed from GitHub in my R library?
                            
                                How to avoid that anytime(<numeric>) "updates by reference"?
                            
                                Safer purrr::map2 for lists with names out of order
                            
                                How to get ride of polygon borders using geom_sf in ggplot2
                            
                                How do I use tidyselect "where" in a custom package?
                            
                                What is the difference between . and .data?
                            
                                jitter if multiple outliers in ggplot2 boxplot
                            
                                mapping over the rows of a data frame
                            
                                Sort a list of nontrivial elements in R
                            
                                How can I read a date series of quarterly data into R?
                            
                                Two Color Scales for geom_line in ggplot2
                            
                                Removing Two Characters From A String

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to subset data.frame by weeks and then sum?

Tags:

datetime

dataframe

r

Tony Breyal

People also ask

2 Answers

Andrie

Jim

Recent Activity

Donate For Us