Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unstack lubridate's interval class

I am trying to transform a dataframe df consisting of a value-column, two date columns (start and end), and a interval-column (duration) into the long format by unnesting/unstacking the duration column.

library(dplyr)
library(lubridate)

df <- data.frame(value = letters[1:3], start = as_date(1:3), end = as_date(3:1)+3) %>% 
          mutate(duration = interval(start, end))

The expected outcome would a dataframe where value, start, and end are duplicated for each day as defined by duration. For instance, the value 'a' would appear 6 times each time on a different day (2nd, 3rd, 4th, 5th, 6th, 7th January 1970).

I tried to use the unnest function from the tidyr package but nothing happened.

tidyr::unnest(df, duration) 

Any help is greatly appreciated :)

like image 663
Ben Nutzer Avatar asked Dec 09 '22 23:12

Ben Nutzer


2 Answers

You can't unstack a column of intervals and expect it to generate all dates in between, but by using seq you can generate them yourself. Try this:

library(tidyverse)
library(lubridate)

df %>%
  rowwise() %>% 
  summarise(
    value, dates = seq(start, end, by = 1)
  )

#> # A tibble: 12 x 2
#>    value dates     
#>    <chr> <date>    
#>  1 a     1970-01-02
#>  2 a     1970-01-03
#>  3 a     1970-01-04
#>  4 a     1970-01-05
#>  5 a     1970-01-06
#>  6 a     1970-01-07
#>  7 b     1970-01-03
#>  8 b     1970-01-04
#>  9 b     1970-01-05
#> 10 b     1970-01-06
#> 11 c     1970-01-04
#> 12 c     1970-01-05

Created on 2021-05-18 by the reprex package (v1.0.0)

like image 83
Peter H. Avatar answered Jan 05 '23 12:01

Peter H.


You can also use the following solution. Since we are going to create duplicate rows we can wrap the operation in a list and then use unnest_longer. purrr package functions have always been my first choice but you can also use this as an alternative.

library(dplyr)
library(tidyr)
library(lubridate)


df %>% 
  group_by(value) %>%
  mutate(date = list(start + 0:(duration/ddays(1)))) %>%
  unnest_longer(date) %>%
  select(-duration)


# A tibble: 12 x 4
# Groups:   value [3]
   value start      end        date      
   <chr> <date>     <date>     <date>    
 1 a     1970-01-02 1970-01-07 1970-01-02
 2 a     1970-01-02 1970-01-07 1970-01-03
 3 a     1970-01-02 1970-01-07 1970-01-04
 4 a     1970-01-02 1970-01-07 1970-01-05
 5 a     1970-01-02 1970-01-07 1970-01-06
 6 a     1970-01-02 1970-01-07 1970-01-07
 7 b     1970-01-03 1970-01-06 1970-01-03
 8 b     1970-01-03 1970-01-06 1970-01-04
 9 b     1970-01-03 1970-01-06 1970-01-05
10 b     1970-01-03 1970-01-06 1970-01-06
11 c     1970-01-04 1970-01-05 1970-01-04
12 c     1970-01-04 1970-01-05 1970-01-05
like image 28
Anoushiravan R Avatar answered Jan 05 '23 11:01

Anoushiravan R