Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Canonical way to include one id column into all elements of resulting list from split.default

I am splitting a data.frame into a list on the basis of its column names. What I want is to include a id column (id) to not just one item but into all elements of the resulting list.

Presently I am doing it through subsequent binding of id column to all items of list through map and bind_cols (alternatives through Map/do.call/mapply etc. I can do similarly myself). What I want to know is there any canonical way of doing it directly, maybe with a function argument of split.default or through some other function directly and thus saving two or three extra steps.

Reproducible example

df <- data.frame(
  stringsAsFactors = FALSE,
                id = c("A", "B", "C"),
             nm1_a = c(928L, 476L, 928L),
             nm1_b = c(61L, 362L, 398L),
             nm2_a = c(965L, 466L, 369L),
             nm2_b = c(240L, 375L, 904L),
             nm3_a = c(429L, 730L, 788L),
             nm3_b = c(99L, 896L, 540L),
             nm3_c = c(463L, 143L, 870L)
      )

df
#>   id nm1_a nm1_b nm2_a nm2_b nm3_a nm3_b nm3_c
#> 1  A   928    61   965   240   429    99   463
#> 2  B   476   362   466   375   730   896   143
#> 3  C   928   398   369   904   788   540   870

What I am doing presently

library(tidyverse)

split.default(df[-1], gsub('^(nm\\d+).*', '\\1', names(df)[-1])) %>%
  map(~ .x %>% bind_cols('id' = df$id, .))
#> $nm1
#>   id nm1_a nm1_b
#> 1  A   928    61
#> 2  B   476   362
#> 3  C   928   398
#> 
#> $nm2
#>   id nm2_a nm2_b
#> 1  A   965   240
#> 2  B   466   375
#> 3  C   369   904
#> 
#> $nm3
#>   id nm3_a nm3_b nm3_c
#> 1  A   429    99   463
#> 2  B   730   896   143
#> 3  C   788   540   870

What I want is exactly the same output, but is there any way to do it directly or a more canonical way?

like image 261
AnilGoyal Avatar asked Jun 10 '21 02:06

AnilGoyal


Video Answer


1 Answers

Just for a diversity of options, here's what you said you didn't want to do. The pivot / split / pivot method can help scale better and adapt beyond keeping an ID based just on column position. It also makes use of the ID in order to do the reshaping, so it might also be more flexible if you have other operations to do in the intermediate steps and don't know for sure that your row order will stay the same—that's one of the reasons I sometimes avoid binding columns. It also (at least for me) makes sense to split data based on some variable rather than by groups of columns.

library(tidyr)

df %>%
  pivot_longer(-id) %>%
  split(stringr::str_extract(.$name, "^nm\\d+")) %>%
  purrr::map(pivot_wider, id_cols = id, names_from = name)
#> $nm1
#> # A tibble: 3 x 3
#>   id    nm1_a nm1_b
#>   <chr> <int> <int>
#> 1 A       928    61
#> 2 B       476   362
#> 3 C       928   398
#> 
#> $nm2
#> # A tibble: 3 x 3
#>   id    nm2_a nm2_b
#>   <chr> <int> <int>
#> 1 A       965   240
#> 2 B       466   375
#> 3 C       369   904
#> 
#> $nm3
#> # A tibble: 3 x 4
#>   id    nm3_a nm3_b nm3_c
#>   <chr> <int> <int> <int>
#> 1 A       429    99   463
#> 2 B       730   896   143
#> 3 C       788   540   870
like image 193
camille Avatar answered Sep 30 '22 02:09

camille