Creating nested data frames with variable dimensions

Question

I have a data frame in which one column keys describes the format of all remaining columns. In the example below there are 2 such value-columns, but in general there may be many more.

library(tidyverse)

dat = tribble(
  ~id, ~keys,    ~vals1,   ~vals2,
  1,    "A/B",   "1/2",   "11/12",
  3,    "C/D/E", "6/7/8", "16"
)

I would like to transform these columns into a single column of nested data frames: in each row the values should be split on "/" and form the rows of a data frame, with headers taken from the keys entry.

Entries in the value columns may be truncated, in which case NA's should be used for the missing values (i.e., the entry "16" in the example should be interpreted as "16/NA/NA".)

The following code produces the wanted column for this particular case:

res = dat %>%
  mutate_at(vars(keys:last_col()), str_split, pattern = fixed("/")) %>%
  mutate(df = pmap(select(., keys:last_col()),
                   ~ bind_rows(setNames(..2, ..1[1:length(..2)]),
                               setNames(..3, ..1[1:length(..3)]))))
res$df
#> [[1]]
#> # A tibble: 2 x 2
#>   A     B    
#>   <chr> <chr>
#> 1 1     2    
#> 2 11    12   
#> 
#> [[2]]
#> # A tibble: 2 x 3
#>   C     D     E    
#>   <chr> <chr> <chr>
#> 1 6     7     8    
#> 2 16    <NA>  <NA>

My question is how to generalise to larger (and unknown) numbers of columns. Also, my use of setNames feels rather clumsy, and I was hoping for something a bit more elegant.

I am primarily looking for a tidyverse solution, but other approaches are welcome.

Update

I should have emphasised that the output I'm looking for is a single data frame, with columns id (unchanged) and df (a list of nested data frames).

(The original keys/values columns are not important; they may be removed.)

Here is the wanted structure in the above example:

str(res %>% select(id, df))
#> Classes 'tbl_df', 'tbl' and 'data.frame':    2 obs. of  2 variables:
#>  $ id: num  1 3
#>  $ df:List of 2
#>   ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of  2 variables:
#>   .. ..$ A: chr  "1" "11"
#>   .. ..$ B: chr  "2" "12"
#>   ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of  3 variables:
#>   .. ..$ C: chr  "6" "16"
#>   .. ..$ D: chr  "7" NA
#>   .. ..$ E: chr  "8" NA

akrun · Accepted Answer

Here is another option after reshaping

library(dplyr)
library(tidyr)
library(purrr)
dat %>% 
  pivot_longer(matches("vals\d+")) %>% 
  select(-id) %>% 
  pivot_wider(names_from = keys, values_from = value) %>% 
  select(-name) %>%
  split.default(seq_along(.)) %>%
  map(~ .x %>% 
           separate(names(.), into = str_split(names(.), fixed("/")) %>% 
                unlist, sep="[/]"))

Creating nested data frames with variable dimensions

Tags:

r

nested

dplyr

purrr

Update

Magnus

1 Answers

akrun

Recent Activity

Donate For Us

Creating nested data frames with variable dimensions

Tags:

r

nested

dplyr

purrr

Update

Magnus

1 Answers

akrun

Related questions

Recent Activity

Donate For Us