Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using R, for each year, I need to sum the sales in different years between same two dates

Tags:

r

dplyr

For two different years, for each year, I need to sum all the sales that occurred from January 3 through March 3. I would prefer a dplyr solution.

All the possible solutions I looked at in stackoverflow used SQL, not R. If someone knows of a solution I missed, please let me know.

In R, I know how to work with groups and to use a variety of dplyr functions, but I need help doing what this post is about.

I would like the output to look like this:

Year   Total Sales
2020   138 
2021   196

Input

df <- data.frame(date=c(20200102, 20200107, 20200210, 20200215, 20200216, 20200302, 20200305, 20210101, 20210104, 20210209, 20210211, 20210215, 20210317, 20210322),
                  sales=c(9,14,27,30,33,34,36,44,45,47,51,53,56,58))
like image 996
Metsfan Avatar asked Dec 07 '22 09:12

Metsfan


1 Answers

One row less than my master akrun's solution :)

  1. With ymd function of lubridate package transform character type to date.
  2. With DayMonth function consider only month and day for the desired interval by month and day
  3. group by year
  4. filter the interval
  5. summarise
library(lubridate)
df %>% 
    mutate(date = ymd(date)) %>% 
    mutate(DayMonth = format(as.Date(date), "%m-%d")) %>% 
    group_by(Year=year(date)) %>% 
    filter(DayMonth>"01-03" & DayMonth<"03-03") %>% 
    summarise(Total_Sales = sum(sales))

Output:

   Year Total_Sales
  <int>       <dbl>
1  2020         138
2  2021         196
like image 182
TarJae Avatar answered Apr 06 '23 01:04

TarJae