For two different years, for each year, I need to sum all the sales that occurred from January 3 through March 3. I would prefer a dplyr solution.
All the possible solutions I looked at in stackoverflow used SQL, not R. If someone knows of a solution I missed, please let me know.
In R, I know how to work with groups and to use a variety of dplyr functions, but I need help doing what this post is about.
I would like the output to look like this:
Year Total Sales
2020 138
2021 196
Input
df <- data.frame(date=c(20200102, 20200107, 20200210, 20200215, 20200216, 20200302, 20200305, 20210101, 20210104, 20210209, 20210211, 20210215, 20210317, 20210322),
sales=c(9,14,27,30,33,34,36,44,45,47,51,53,56,58))
One row less than my master akrun's solution :)
ymd
function of lubridate
package transform character type to date.DayMonth
function consider only month and day for the desired interval by month and dayyear
library(lubridate)
df %>%
mutate(date = ymd(date)) %>%
mutate(DayMonth = format(as.Date(date), "%m-%d")) %>%
group_by(Year=year(date)) %>%
filter(DayMonth>"01-03" & DayMonth<"03-03") %>%
summarise(Total_Sales = sum(sales))
Output:
Year Total_Sales
<int> <dbl>
1 2020 138
2 2021 196
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With