I've got a tibble like below:
structure(list(id = 1:11, var1 = c("A", "C", "B", "B", "B", "A",
"B", "C", "C", "C", "B"), var2 = list(NULL, NULL, NULL, structure(list(
x = c(0, 1, 23, 3), y = c(0.75149005651474, 0.149892757181078,
0.695984086720273, 0.0247649133671075)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame")), NULL, NULL,
NULL, NULL, NULL, NULL, NULL)), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
I'd like to leave only the rows where var2
is NOT null. But the simple !is.null()
just doesn't work. df %>% filter(!is.null(var2))
returns the whole df
. Why is that and how can I filter out all those rows with NULL
in var2
column?
One possibility also involving purrr
could be:
df %>%
filter(!map_lgl(var2, is.null))
id var1 var2
<int> <chr> <list>
1 4 B <tibble [4 × 2]>
Reflecting the properties of is.null()
, you can also do:
df %>%
rowwise() %>%
filter(!is.null(var2))
The function drop_na()
from tidyr
will also work for NULL
. You just have to be careful for the edge case where you have both NULL
and NA
values and only wanted to drop the NULL
for some reason.
Drop rows containing missing values
library(tidyr)
df %>%
drop_na(var2)
# id var1 var2
# <int> <chr> <list>
# 1 4 B <tibble[,2] [4 x 2]>
!is.null()
doesnt work because your var2
is a nested list (list of lists). It contains a tibble as its fourth element. A tibble
is a list beacuse it is a data.frame
and is.null
checks only the first level of the nested list.
#show that the tibble is a list:
> is.list(df$var2[[4]])
[1] TRUE
You can try filtering on lengths(df$var2) > 0
> lengths(df$var2)
[1] 0 0 0 2 0 0 0 0 0 0 0
# each of the columns of the tibble in var2[[4]] is one element
# of the list contained in var2[[4]]. Thus var2[[4]] is a list of length two
> df %>% filter(lengths(var2) > 0)
# A tibble: 1 x 3
id var1 var2
<int> <chr> <list>
1 4 B <tibble [4 x 2]>
>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With