Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

separate data into columns given by another column in tidyr

Tags:

r

tidyr

I am tidying data in which the desired column name mapping is given in a separate column, like so:

df <- data.frame(splitme = c("6, 7, 8, 9", "1,2,3"), 
                 type = c("A, B, C, D", "A, C, D"))

df looks like:

     splitme       type
 6, 7, 8, 9       A, B, C, D
      1,2,3       A, C, D

The desired output should look like:

desired_output <- data.frame(A = c(6,1), 
                             B = c(7, NA), 
                             C = c(8,2), 
                             D = c(9,3))

i.e.:

  A  B C D
  6  7 8 9
  1 NA 2 3

If it were not for the fact that some rows have missing types, this would be a straight-forward task for tidyr::separate.

## Not correctly aligned
df %>% 
tidyr::separate(splitme, into = c("A", "B", "C", "D")) %>% 
select(-type)

but clearly the alignment poses issues. If only the into argument could take a column specifying the split rule. Perhaps there is a purr::pmap_df based strategy that could be used here?

like image 737
cboettig Avatar asked Apr 03 '18 21:04

cboettig


People also ask

How do I separate data into separate columns in R?

To split a column into multiple columns in the R Language, we use the separator() function of the dplyr package library. The separate() function separates a character column into multiple columns with a regular expression or numeric locations.

Which function in Tidyr package is used to split a single column into multiple columns?

Use the extract Function to Split Column Into Two Columns in R. Another useful function to split a column into two separate ones is extract , which is also part of the tidyr package. extract function works on columns using regular expressions groups.


2 Answers

You can use separate_rows followed by a reshape with spread:

library(dplyr); library(tidyr);
df %>% 
    # add a row identification number for reshaping purpose
    mutate(rn = row_number()) %>% 
    separate_rows(splitme, type) %>% 
    spread(type, splitme) %>% 
    select(-rn)

#  A    B C D
#1 6    7 8 9
#2 1 <NA> 2 3
like image 153
Psidom Avatar answered Oct 02 '22 21:10

Psidom


Using purrr:map2_dfr, instead of parsing the splitme column we use the string directly in a data.frame call. We name the columns and map2_dfr bind the rows and deals with the mising values.

library(purrr)
map2_dfr(df$splitme,df$type,
         ~setNames(eval(parse(text=paste0("data.frame(",.x,")"))),
                   strsplit(.y,", ")[[1]]))
#   A  B C D
# 1 6  7 8 9
# 2 1 NA 2 3
like image 24
Moody_Mudskipper Avatar answered Oct 02 '22 21:10

Moody_Mudskipper