Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all possible combinations in a time-series data with variable daily readings

Tags:

r

dplyr

purrr

I have a time-series dataset of daily consumption which looks like the following:

consumption <- data.frame(
  date = as.Date(c('2020-06-01','2020-06-02','2020-06-03','2020-06-03',
                   '2020-06-03','2020-06-04','2020-06-05','2020-06-05')),
  val = c(10,20,31,32,33,40,51,52)
)

consumption <- consumption %>%
  group_by(date) %>%
  mutate(n = n(), record = row_number()) %>%
  ungroup()

consumption

# A tibble: 8 × 4
  date         val     n record
  <date>     <dbl> <int>  <int>
1 2020-06-01    10     1      1
2 2020-06-02    20     1      1
3 2020-06-03    31     3      1
4 2020-06-03    32     3      2
5 2020-06-03    33     3      3
6 2020-06-04    40     1      1
7 2020-06-05    51     2      1
8 2020-06-05    52     2      2

Some days have more than one rows in the dataset. I would like to transform this into split groups with all possible combinations such as:

Group 1:

        date val record
1 2020-06-01  10      1
2 2020-06-02  20      1
3 2020-06-03  31      1
4 2020-06-04  40      1
5 2020-06-05  51      1

Group 2:

        date val record
1 2020-06-01  10      1
2 2020-06-02  20      1
3 2020-06-03  31      1
4 2020-06-04  40      1
5 2020-06-05  52      2

Group 3:

        date val record
1 2020-06-01  10      1
2 2020-06-02  20      1
3 2020-06-03  32      2
4 2020-06-04  40      1
5 2020-06-05  51      1

Group 4:

        date val record
1 2020-06-01  10      1
2 2020-06-02  20      1
3 2020-06-03  32      2
4 2020-06-04  40      1
5 2020-06-05  52      2

Group 5:

        date val record
1 2020-06-01  10      1
2 2020-06-02  20      1
3 2020-06-03  33      3
4 2020-06-04  40      1
5 2020-06-05  51      1

Group 6:

        date val record
1 2020-06-01  10      1
2 2020-06-02  20      1
3 2020-06-03  33      3
4 2020-06-04  40      1
5 2020-06-05  52      2

I've tried the following solution, but it does not produce the desired results.

library(dplyr)
library(purrr)
out <- consumption %>% 
   filter(n > 1) %>%
    group_split(date, rn = row_number()) %>% 
    map(~ bind_rows(consumption %>%
          filter(n == 1), .x %>%
             select(-rn)) %>% 
         arrange(date))

Your help to getting around this would be much appreciated.

Many thanks,

like image 501
M.Qasim Avatar asked Dec 31 '22 13:12

M.Qasim


1 Answers

We could filter where the 'record' is greater than 1, group_split by 'row_number' and 'date', then bind the rows with the filtered data where the 'record' is 1

library(dplyr)
library(purrr)
out <- consumption %>% 
   filter(n > 1) %>%
    group_split(date, rn = row_number()) %>% 
    map(~ bind_rows(consumption %>%
          filter(n == 1), .x %>%
             select(-rn)) %>% 
         arrange(date))

-output

> out
[[1]]
# A tibble: 4 x 4
  date         val     n record
  <date>     <dbl> <int>  <int>
1 2020-06-01    10     1      1
2 2020-06-02    20     1      1
3 2020-06-03    31     3      1
4 2020-06-04    40     1      1

[[2]]
# A tibble: 4 x 4
  date         val     n record
  <date>     <dbl> <int>  <int>
1 2020-06-01    10     1      1
2 2020-06-02    20     1      1
3 2020-06-03    32     3      2
4 2020-06-04    40     1      1

[[3]]
# A tibble: 4 x 4
  date         val     n record
  <date>     <dbl> <int>  <int>
1 2020-06-01    10     1      1
2 2020-06-02    20     1      1
3 2020-06-03    33     3      3
4 2020-06-04    40     1      1

With the updated data, we create the row_number(), then split it by 'date' column (as in @ThomasIsCoding solution), use crossing (from purrr) to expand the data, and loop over the rows with pmap, slice the rows of the original data based on the row index

library(tidyr)
library(tibble)
consumption %>%
     transmute(date, rn = row_number()) %>%
     deframe %>%
     split(names(.)) %>%
     invoke(crossing, .) %>%
     pmap(~ consumption %>% 
        slice(c(...))) %>%
     unname

-output

[[1]]
# A tibble: 5 x 4
  date         val     n record
  <date>     <dbl> <int>  <int>
1 2020-06-01    10     1      1
2 2020-06-02    20     1      1
3 2020-06-03    31     3      1
4 2020-06-04    40     1      1
5 2020-06-05    51     2      1

[[2]]
# A tibble: 5 x 4
  date         val     n record
  <date>     <dbl> <int>  <int>
1 2020-06-01    10     1      1
2 2020-06-02    20     1      1
3 2020-06-03    31     3      1
4 2020-06-04    40     1      1
5 2020-06-05    52     2      2

[[3]]
# A tibble: 5 x 4
  date         val     n record
  <date>     <dbl> <int>  <int>
1 2020-06-01    10     1      1
2 2020-06-02    20     1      1
3 2020-06-03    32     3      2
4 2020-06-04    40     1      1
5 2020-06-05    51     2      1

[[4]]
# A tibble: 5 x 4
  date         val     n record
  <date>     <dbl> <int>  <int>
1 2020-06-01    10     1      1
2 2020-06-02    20     1      1
3 2020-06-03    32     3      2
4 2020-06-04    40     1      1
5 2020-06-05    52     2      2

[[5]]
# A tibble: 5 x 4
  date         val     n record
  <date>     <dbl> <int>  <int>
1 2020-06-01    10     1      1
2 2020-06-02    20     1      1
3 2020-06-03    33     3      3
4 2020-06-04    40     1      1
5 2020-06-05    51     2      1

[[6]]
# A tibble: 5 x 4
  date         val     n record
  <date>     <dbl> <int>  <int>
1 2020-06-01    10     1      1
2 2020-06-02    20     1      1
3 2020-06-03    33     3      3
4 2020-06-04    40     1      1
5 2020-06-05    52     2      2
like image 177
akrun Avatar answered Jan 13 '23 13:01

akrun