separate data into columns given by another column in tidyr




I am tidying data in which the desired column name mapping is given in a separate column, like so:

df <- data.frame(splitme = c("6, 7, 8, 9", "1,2,3"), 
                 type = c("A, B, C, D", "A, C, D"))

df looks like:

     splitme       type
 6, 7, 8, 9       A, B, C, D
      1,2,3       A, C, D

The desired output should look like:

desired_output <- data.frame(A = c(6,1), 
                             B = c(7, NA), 
                             C = c(8,2), 
                             D = c(9,3))


  A  B C D
  6  7 8 9
  1 NA 2 3

If it were not for the fact that some rows have missing types, this would be a straight-forward task for tidyr::separate.

## Not correctly aligned
df %>% 
tidyr::separate(splitme, into = c("A", "B", "C", "D")) %>% 

but clearly the alignment poses issues. If only the into argument could take a column specifying the split rule. Perhaps there is a purr::pmap_df based strategy that could be used here?

2 Answers

You can use separate_rows followed by a reshape with spread:

library(dplyr); library(tidyr);
df %>% 
    # add a row identification number for reshaping purpose
    mutate(rn = row_number()) %>% 
    separate_rows(splitme, type) %>% 
    spread(type, splitme) %>% 

#  A    B C D
#1 6    7 8 9
#2 1 <NA> 2 3
Using purrr:map2_dfr, instead of parsing the splitme column we use the string directly in a data.frame call. We name the columns and map2_dfr bind the rows and deals with the mising values.

                   strsplit(.y,", ")[[1]]))
#   A  B C D
# 1 6  7 8 9
# 2 1 NA 2 3
