Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter out NULL elements of tibble's list column

Tags:

r

dplyr

I've got a tibble like below:

structure(list(id = 1:11, var1 = c("A", "C", "B", "B", "B", "A", 
"B", "C", "C", "C", "B"), var2 = list(NULL, NULL, NULL, structure(list(
    x = c(0, 1, 23, 3), y = c(0.75149005651474, 0.149892757181078, 
    0.695984086720273, 0.0247649133671075)), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame")), NULL, NULL, 
    NULL, NULL, NULL, NULL, NULL)), row.names = c(NA, -11L), class = c("tbl_df", 
"tbl", "data.frame"))

I'd like to leave only the rows where var2 is NOT null. But the simple !is.null() just doesn't work. df %>% filter(!is.null(var2)) returns the whole df. Why is that and how can I filter out all those rows with NULL in var2 column?

like image 507
jakes Avatar asked Aug 03 '19 07:08

jakes


3 Answers

One possibility also involving purrr could be:

df %>%
 filter(!map_lgl(var2, is.null))

     id var1  var2            
  <int> <chr> <list>          
1     4 B     <tibble [4 × 2]>

Reflecting the properties of is.null(), you can also do:

df %>%
 rowwise() %>%
 filter(!is.null(var2))
like image 120
tmfmnk Avatar answered Sep 29 '22 20:09

tmfmnk


The function drop_na() from tidyr will also work for NULL. You just have to be careful for the edge case where you have both NULL and NA values and only wanted to drop the NULL for some reason.

Drop rows containing missing values

library(tidyr)

df %>% 
  drop_na(var2)

#        id var1  var2                
#     <int> <chr> <list>              
#   1     4 B     <tibble[,2] [4 x 2]>
like image 20
Adam Avatar answered Sep 29 '22 20:09

Adam


!is.null() doesnt work because your var2 is a nested list (list of lists). It contains a tibble as its fourth element. A tibble is a list beacuse it is a data.frame and is.null checks only the first level of the nested list.

#show that the tibble is a list:
> is.list(df$var2[[4]])
[1] TRUE

You can try filtering on lengths(df$var2) > 0

> lengths(df$var2)
 [1] 0 0 0 2 0 0 0 0 0 0 0  
# each of the columns of the tibble in var2[[4]] is one element 
# of the list contained in var2[[4]]. Thus var2[[4]] is a list of length two

> df %>% filter(lengths(var2) > 0)
# A tibble: 1 x 3
     id var1  var2            
  <int> <chr> <list>          
1     4 B     <tibble [4 x 2]>
> 
like image 37
Grada Gukovic Avatar answered Sep 29 '22 22:09

Grada Gukovic