I have a data frame in which one column keys describes the format of all remaining columns. In the example below there are 2 such value-columns, but in general there may be many more.
library(tidyverse)
dat = tribble(
~id, ~keys, ~vals1, ~vals2,
1, "A/B", "1/2", "11/12",
3, "C/D/E", "6/7/8", "16"
)
I would like to transform these columns into a single column of nested data frames: in each row the values should be split on "/" and form the rows of a data frame, with headers taken from the keys entry.
Entries in the value columns may be truncated, in which case NA's should be used for the missing values (i.e., the entry "16" in the example should be interpreted as "16/NA/NA".)
The following code produces the wanted column for this particular case:
res = dat %>%
mutate_at(vars(keys:last_col()), str_split, pattern = fixed("/")) %>%
mutate(df = pmap(select(., keys:last_col()),
~ bind_rows(setNames(..2, ..1[1:length(..2)]),
setNames(..3, ..1[1:length(..3)]))))
res$df
#> [[1]]
#> # A tibble: 2 x 2
#> A B
#> <chr> <chr>
#> 1 1 2
#> 2 11 12
#>
#> [[2]]
#> # A tibble: 2 x 3
#> C D E
#> <chr> <chr> <chr>
#> 1 6 7 8
#> 2 16 <NA> <NA>
My question is how to generalise to larger (and unknown) numbers of columns. Also, my use of setNames feels rather clumsy, and I was hoping for something a bit more elegant.
I am primarily looking for a tidyverse solution, but other approaches are welcome.
I should have emphasised that the output I'm looking for is a single data frame, with columns id (unchanged) and df (a list of nested data frames).
(The original keys/values columns are not important; they may be removed.)
Here is the wanted structure in the above example:
str(res %>% select(id, df))
#> Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 2 variables:
#> $ id: num 1 3
#> $ df:List of 2
#> ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 2 variables:
#> .. ..$ A: chr "1" "11"
#> .. ..$ B: chr "2" "12"
#> ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 3 variables:
#> .. ..$ C: chr "6" "16"
#> .. ..$ D: chr "7" NA
#> .. ..$ E: chr "8" NA
Here is another option after reshaping
library(dplyr)
library(tidyr)
library(purrr)
dat %>%
pivot_longer(matches("vals\\d+")) %>%
select(-id) %>%
pivot_wider(names_from = keys, values_from = value) %>%
select(-name) %>%
split.default(seq_along(.)) %>%
map(~ .x %>%
separate(names(.), into = str_split(names(.), fixed("/")) %>%
unlist, sep="[/]"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With