I am splitting a data.frame into a list on the basis of its column names. What I want is to include a id column (id
) to not just one item but into all elements of the resulting list.
Presently I am doing it through subsequent binding of id
column to all items of list through map
and bind_cols
(alternatives through Map
/do.call
/mapply
etc. I can do similarly myself). What I want to know is there any canonical way of doing it directly, maybe with a function argument of split.default
or through some other function directly and thus saving two or three extra steps.
Reproducible example
df <- data.frame(
stringsAsFactors = FALSE,
id = c("A", "B", "C"),
nm1_a = c(928L, 476L, 928L),
nm1_b = c(61L, 362L, 398L),
nm2_a = c(965L, 466L, 369L),
nm2_b = c(240L, 375L, 904L),
nm3_a = c(429L, 730L, 788L),
nm3_b = c(99L, 896L, 540L),
nm3_c = c(463L, 143L, 870L)
)
df
#> id nm1_a nm1_b nm2_a nm2_b nm3_a nm3_b nm3_c
#> 1 A 928 61 965 240 429 99 463
#> 2 B 476 362 466 375 730 896 143
#> 3 C 928 398 369 904 788 540 870
What I am doing presently
library(tidyverse)
split.default(df[-1], gsub('^(nm\\d+).*', '\\1', names(df)[-1])) %>%
map(~ .x %>% bind_cols('id' = df$id, .))
#> $nm1
#> id nm1_a nm1_b
#> 1 A 928 61
#> 2 B 476 362
#> 3 C 928 398
#>
#> $nm2
#> id nm2_a nm2_b
#> 1 A 965 240
#> 2 B 466 375
#> 3 C 369 904
#>
#> $nm3
#> id nm3_a nm3_b nm3_c
#> 1 A 429 99 463
#> 2 B 730 896 143
#> 3 C 788 540 870
What I want is exactly the same output, but is there any way to do it directly or a more canonical way?
Just for a diversity of options, here's what you said you didn't want to do. The pivot / split / pivot method can help scale better and adapt beyond keeping an ID based just on column position. It also makes use of the ID in order to do the reshaping, so it might also be more flexible if you have other operations to do in the intermediate steps and don't know for sure that your row order will stay the same—that's one of the reasons I sometimes avoid binding columns. It also (at least for me) makes sense to split data based on some variable rather than by groups of columns.
library(tidyr)
df %>%
pivot_longer(-id) %>%
split(stringr::str_extract(.$name, "^nm\\d+")) %>%
purrr::map(pivot_wider, id_cols = id, names_from = name)
#> $nm1
#> # A tibble: 3 x 3
#> id nm1_a nm1_b
#> <chr> <int> <int>
#> 1 A 928 61
#> 2 B 476 362
#> 3 C 928 398
#>
#> $nm2
#> # A tibble: 3 x 3
#> id nm2_a nm2_b
#> <chr> <int> <int>
#> 1 A 965 240
#> 2 B 466 375
#> 3 C 369 904
#>
#> $nm3
#> # A tibble: 3 x 4
#> id nm3_a nm3_b nm3_c
#> <chr> <int> <int> <int>
#> 1 A 429 99 463
#> 2 B 730 896 143
#> 3 C 788 540 870
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With