Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using tidyr unnest with NULL values

Tags:

r

tidyr

unnest

I converted a JSON file into a data.frame with a a nested list structure, which I would like to unnest and flatten. Some of the values in the list are NULL, which unnest does not accept. If I replace the NULL values with a data.frame structure that has only NA values, I get the desired result.

Below is a simplified example of my problem. I have tried to replace the NULL values with the NA data.frame but did not manage because of the the nested structure. How can I achieve the desired result?

Example

library(tidyr)
input1 <- data.frame(id = c("c", "d", "e"), value = c(7, 8, 9))
input2 <- NULL
input3 <- data.frame(id = c(NA), value = c(NA))

df <- dplyr::tibble(
a = c(1, 2),
b = list(a = input1, c = input2))  
unnest(df)

gives the error "Error: Each column must either be a list of vectors or a list of data frames [b]"

df2 <- dplyr::tibble(
a = c(1, 2),
b = list(a = input1, c = input3))  
unnest(df2)

gives the desired output.

like image 668
Michiel van Dijk Avatar asked Nov 10 '17 14:11

Michiel van Dijk


2 Answers

As noted by @Frank above, as of tidyr 1.0.0 this can be achieved using the keep_empty argument

library(tidyr)

input1 <- data.frame(id = c("c", "d", "e"), value = c(7, 8, 9))
input2 <- NULL

df <- tibble::tibble(
  a = c(1, 2),
  b = list(a = input1, c = input2)
)

unnest(df, cols = b, keep_empty = TRUE)
#> # A tibble: 4 x 3
#>       a id    value
#>   <dbl> <chr> <dbl>
#> 1     1 c         7
#> 2     1 d         8
#> 3     1 e         9
#> 4     2 NA       NA

Created on 2021-09-24 by the reprex package (v2.0.1)

like image 156
pyg Avatar answered Oct 23 '22 10:10

pyg


We can use map_lgl from purrr here. If you don't care about those rows with a NULL, you could simply remove them with filter and unnest:

library(tidyverse)

df %>% 
  filter(!map_lgl(b, is.null)) %>% 
  unnest() 
#> # A tibble: 3 x 3
#>       a     id value
#>   <dbl> <fctr> <dbl>
#> 1     1      c     7
#> 2     1      d     8
#> 3     1      e     9

In case you want to keep those rows, you could bring them back with right_join after unnesting:

df %>% 
  filter(!map_lgl(b, is.null)) %>% 
  unnest() %>% 
  right_join(select(df, a))
#> Joining, by = "a"
#> # A tibble: 4 x 3
#>       a     id value
#>   <dbl> <fctr> <dbl>
#> 1     1      c     7
#> 2     1      d     8
#> 3     1      e     9
#> 4     2   <NA>    NA

Data

input1 <- data.frame(id = c("c", "d", "e"), value = c(7, 8, 9))
input2 <- NULL
input3 <- data.frame(id = c(NA), value = c(NA))

df <- dplyr::tibble(
  a = c(1, 2),
  b = list(a = input1, c = input2)
)  
like image 44
Thomas K Avatar answered Oct 23 '22 08:10

Thomas K