Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use filter() (and other dplyr functions) inside nested data frames with map()

I'm trying to use map() of purrr package to apply filter() function to the data stored in a nested data frame.

"Why wouldn't you filter first, and then nest? - you might ask. That will work (and I'll show my desired outcome using such process), but I'm looking for ways to do it with purrr. I want to have just one data frame, with two list-columns, both being nested data frames - one full and one filtered.

I can achieve it now by performing nest() twice: once on all data, and second on filtered data:

library(tidyverse)

df <- tibble(
  a = sample(x = rep(c('x','y'),5), size = 10),
  b = sample(c(1:10)),
  c = sample(c(91:100))
)

df_full_nested <- df %>% 
  group_by(a) %>% 
  nest(.key = 'full')

df_filter_nested <- df %>%
  filter(c >= 95) %>%  ##this is the key step
  group_by(a) %>% 
  nest(.key = 'filtered')

## Desired outcome - one data frame with 2 nested list-columns: one full and one filtered.
## How to achieve this without breaking it out into 2 separate data frames?
df_nested <- df_full_nested %>% 
  left_join(df_filter_nested, by = 'a')

The objects look like this:

> df
# A tibble: 10 x 3
       a     b     c
   <chr> <int> <int>
 1     y     8    93
 2     x     9    94
 3     y    10    99
 4     x     5    97
 5     y     2   100
 6     y     3    95
 7     x     7    96
 8     y     6    92
 9     x     4    91
10     x     1    98

> df_full_nested
# A tibble: 2 x 2
      a             full
  <chr>           <list>
1     y <tibble [5 x 2]>
2     x <tibble [5 x 2]>

> df_filter_nested
# A tibble: 2 x 2
      a         filtered
  <chr>           <list>
1     y <tibble [3 x 2]>
2     x <tibble [3 x 2]>

> df_nested
# A tibble: 2 x 3
      a             full         filtered
  <chr>           <list>           <list>
1     y <tibble [5 x 2]> <tibble [4 x 2]>
2     x <tibble [5 x 2]> <tibble [4 x 2]>

So, this works. But it is not clean. And in real life, I group by several columns, which means I also have to join on several columns... It gets hairy fast.

I'm wondering if there is a way to apply filter to the nested column. This way, I'd operate within the same object. Just cleaner and more understandable code.

I'm thinking it'd look like

df_full_nested %>% mutate(filtered = map(full, ...))

But I am not sure how to map filter() properly

Thanks!

like image 480
Taraas Avatar asked Nov 07 '17 19:11

Taraas


People also ask

What does the filter () function do in R?

The filter() method in R is used to subset a data frame based on a provided condition. If a row satisfies the condition, it must produce TRUE . Otherwise, non-satisfying rows will return NA values. Hence, the row will be dropped.

What is filter function in Dplyr?

The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .

Can you use filter on a data frame in R?

How to apply a filter on dataframe in R ? A filter () function is used to filter out specified elements from a dataframe that return TRUE value for the given condition (s). filter () helps to reduce a huge dataset into small chunks of datasets.

Is Dplyr a filter?

Of course, dplyr has 'filter()' function to do such filtering, but there is even more.


1 Answers

You can use map(full, ~ filter(., c >= 95)), where . stands for individual nested tibble, to which you can apply the filter directly:

df_nested_2 <- df_full_nested %>% mutate(filtered = map(full, ~ filter(., c >= 95)))

identical(df_nested, df_nested_2)
# [1] TRUE
like image 133
Psidom Avatar answered Sep 28 '22 11:09

Psidom