Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create dataframe with month start and end in R

Tags:

r

tidyverse

I want to create a dataframe from a given start and end date:

start_date <- as.Date("2020-05-17")
end_date <- as.Date("2020-06-23")

For each row in this dataframe, I should have the start day and end day of the month, so the expected output is:

start       end         month   year
2020-05-17  2020-05-31  May     2020
2020-06-01  2020-06-23  June    2020

I have tried to create a sequence, but I'm stuck on what to do next:

day_seq <- seq(start_date, end_date, 1)

Please, a base R or tidyverse solution will be greatly appreciated.

like image 991
Alexis Avatar asked Sep 05 '25 02:09

Alexis


2 Answers

1) yearmon Using start_date and end_date from the question create a yearmon sequence and then each of the desired columns is a simple one line computation. The stringAsFactors line can be omitted under R 4.0 onwards as that is the default there.

library(zoo)

ym <- seq(as.yearmon(start_date), as.yearmon(end_date), 1/12)

data.frame(start = pmax(start_date, as.Date(ym)),
           end = pmin(end_date, as.Date(ym, frac = 1)),
           month = month.name[cycle(ym)],
           year = as.integer(ym),
           stringsAsFactors = FALSE)

giving:

       start        end month year
1 2020-05-17 2020-05-31   May 2020
2 2020-06-01 2020-06-23  June 2020

2) Base R This follows similar logic and gives the same answer. We first define a function month1 which given a Date class vector x returns a Date vector the same length but for the first of the month.

month1 <- function(x) as.Date(cut(x, "month"))

months <- seq(month1(start_date), month1(end_date), "month")
data.frame(start = pmax(start_date, months),
           end = pmin(end_date, month1(months + 31) - 1),
           month = format(months, "%B"),
           year = as.numeric(format(months, "%Y")),
           stringsAsFactors = FALSE)
like image 141
G. Grothendieck Avatar answered Sep 07 '25 08:09

G. Grothendieck


A while ago that I used the tidyverse, but here is my go at things..

sample data

different sample data to tagckle some problems wher the year changes..

start_date <- as.Date("2020-05-17")
end_date <- as.Date("2021-06-23")

code

library( tidyverse )
library( lubridate )
#create a sequence of days from start to end
tibble( date = seq( start_date, end_date, by = "1 day" ) ) %>%
  mutate( month = lubridate::month( date ),
          year = lubridate::year( date ),
          end = as.Date( paste( year, month, lubridate::days_in_month(date), sep = "-" ) ) ) %>%
  #the end of the last group is now always larger than tghe maximum date... repair!
  mutate( end = if_else( end > max(date), max(date), end ) ) %>%
  group_by( year, month ) %>%
  summarise( start = min( date ), 
             end = max( end ) ) %>%
  select( start, end, month, year )

output

# # A tibble: 14 x 4
# # Groups:   year [2]
# start      end        month  year
# <date>     <date>     <dbl> <dbl>
# 1 2020-05-17 2020-05-31     5  2020
# 2 2020-06-01 2020-06-30     6  2020
# 3 2020-07-01 2020-07-31     7  2020
# 4 2020-08-01 2020-08-31     8  2020
# 5 2020-09-01 2020-09-30     9  2020
# 6 2020-10-01 2020-10-31    10  2020
# 7 2020-11-01 2020-11-30    11  2020
# 8 2020-12-01 2020-12-31    12  2020
# 9 2021-01-01 2021-01-31     1  2021
# 10 2021-02-01 2021-02-28     2  2021
# 11 2021-03-01 2021-03-31     3  2021
# 12 2021-04-01 2021-04-30     4  2021
# 13 2021-05-01 2021-05-31     5  2021
# 14 2021-06-01 2021-06-23     6  2021
like image 27
Wimpel Avatar answered Sep 07 '25 10:09

Wimpel