Can I unnest a list column directly into n columns?
The list can be assumed to regular, with all elements being of equal length.
If instead of a list column I would have a character vector, I could tidyr::separate
. I can tidyr::unnest
, but we need another helper variable to be able to tidyr::spread
. Am I missing an obvious method?
Example data:
library(tibble) df1 <- data_frame( gr = c('a', 'b', 'c'), values = list(1:2, 3:4, 5:6) )
# A tibble: 3 x 2 gr values <chr> <list> 1 a <int [2]> 2 b <int [2]> 3 c <int [2]>
Goal:
df2 <- data_frame( gr = c('a', 'b', 'c'), V1 = c(1, 3, 5), V2 = c(2, 4, 6) )
# A tibble: 3 x 3 gr V1 V2 <chr> <dbl> <dbl> 1 a 1. 2. 2 b 3. 4. 3 c 5. 6.
Current method:
unnest(df1) %>% group_by(gr) %>% mutate(r = paste0('V', row_number())) %>% spread(r, values)
The tidyr package in R is used to “tidy” up the data. The unnest() method in the package can be used to convert the data frame into an unnested object by specifying the input data and its corresponding columns to use in unnesting. The output is produced in the form of a tibble in R.
Nesting creates a list-column of data frames; unnesting flattens it back out into regular columns. Nesting is a implicitly summarising operation: you get one row for each group defined by the non-nested columns. This is useful in conjunction with other summaries that work with whole datasets, most notably models.
The UNNEST function returns a result table that includes a row for each element of the specified array. If there are multiple ordinary array arguments specified, the number of rows will match the array with the largest cardinality.
with tidyr 1.0.0 you can do :
library(tidyr) df1 <- tibble( gr = c('a', 'b', 'c'), values = list(1:2, 3:4, 5:6) ) unnest_wider(df1, values) #> New names: #> * `` -> ...1 #> * `` -> ...2 #> New names: #> * `` -> ...1 #> * `` -> ...2 #> New names: #> * `` -> ...1 #> * `` -> ...2 #> # A tibble: 3 x 3 #> gr ...1 ...2 #> <chr> <int> <int> #> 1 a 1 2 #> 2 b 3 4 #> 3 c 5 6
Created on 2019-09-14 by the reprex package (v0.3.0)
The output is verbose here because the elements that were unnested horizontally (the vector elements) were not named, and unnest_wider
doesn't want to guess silently.
We can name them beforehand to avoid it :
df1 %>% dplyr::mutate(values = purrr::map(values, setNames, c("V1","V2"))) %>% unnest_wider(values) #> # A tibble: 3 x 3 #> gr V1 V2 #> <chr> <int> <int> #> 1 a 1 2 #> 2 b 3 4 #> 3 c 5 6
Or just use suppressMessages()
or purrr::quietly()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With