I'm trying to use <code>map()</code> of <code>purrr</code> package to apply <code>filter()</code> function to the data stored in a nested data frame. "Why wouldn't you filter first, and then nest? - you might ask. That will work (and I'll show my desired outcome using such process), but I'm looking for ways to do it with <code>purrr</code>. I want to have just one data frame, with two list-columns, both being nested data frames - one full and one filtered. I can achieve it now by performing <code>nest()</code> twice: once on all data, and second on filtered data: <pre class="prettyprint"><code>library(tidyverse) df <- tibble( a = sample(x = rep(c('x','y'),5), size = 10), b = sample(c(1:10)), c = sample(c(91:100)) ) df_full_nested <- df %>% group_by(a) %>% nest(.key = 'full') df_filter_nested <- df %>% filter(c >= 95) %>% ##this is the key step group_by(a) %>% nest(.key = 'filtered') ## Desired outcome - one data frame with 2 nested list-columns: one full and one filtered. ## How to achieve this without breaking it out into 2 separate data frames? df_nested <- df_full_nested %>% left_join(df_filter_nested, by = 'a') </code></pre> The objects look like this: <pre class="prettyprint"><code>> df # A tibble: 10 x 3 a b c <chr> <int> <int> 1 y 8 93 2 x 9 94 3 y 10 99 4 x 5 97 5 y 2 100 6 y 3 95 7 x 7 96 8 y 6 92 9 x 4 91 10 x 1 98 > df_full_nested # A tibble: 2 x 2 a full <chr> <list> 1 y <tibble [5 x 2]> 2 x <tibble [5 x 2]> > df_filter_nested # A tibble: 2 x 2 a filtered <chr> <list> 1 y <tibble [3 x 2]> 2 x <tibble [3 x 2]> > df_nested # A tibble: 2 x 3 a full filtered <chr> <list> <list> 1 y <tibble [5 x 2]> <tibble [4 x 2]> 2 x <tibble [5 x 2]> <tibble [4 x 2]> </code></pre> So, this works. But it is not clean. And in real life, I group by several columns, which means I also have to join on several columns... It gets hairy fast. I'm wondering if there is a way to apply filter to the nested column. This way, I'd operate within the same object. Just cleaner and more understandable code. I'm thinking it'd look like <pre class="prettyprint"><code>df_full_nested %>% mutate(filtered = map(full, ...)) </code></pre> But I am not sure how to map <code>filter()</code> properly Thanks!

You can use <code>map(full, ~ filter(., c >= 95))</code>, where <code>.</code> stands for individual nested tibble, to which you can apply the filter directly: <pre class="prettyprint"><code>df_nested_2 <- df_full_nested %>% mutate(filtered = map(full, ~ filter(., c >= 95))) identical(df_nested, df_nested_2) # [1] TRUE </code></pre>

Use filter() (and other dplyr functions) inside nested data frames with map()

Tags:

r

dplyr

purrr

tidyverse

I'm trying to use map() of purrr package to apply filter() function to the data stored in a nested data frame.

"Why wouldn't you filter first, and then nest? - you might ask. That will work (and I'll show my desired outcome using such process), but I'm looking for ways to do it with purrr. I want to have just one data frame, with two list-columns, both being nested data frames - one full and one filtered.

I can achieve it now by performing nest() twice: once on all data, and second on filtered data:

library(tidyverse)

df <- tibble(
  a = sample(x = rep(c('x','y'),5), size = 10),
  b = sample(c(1:10)),
  c = sample(c(91:100))
)

df_full_nested <- df %>% 
  group_by(a) %>% 
  nest(.key = 'full')

df_filter_nested <- df %>%
  filter(c >= 95) %>%  ##this is the key step
  group_by(a) %>% 
  nest(.key = 'filtered')

## Desired outcome - one data frame with 2 nested list-columns: one full and one filtered.
## How to achieve this without breaking it out into 2 separate data frames?
df_nested <- df_full_nested %>% 
  left_join(df_filter_nested, by = 'a')

The objects look like this:

> df
# A tibble: 10 x 3
       a     b     c
   <chr> <int> <int>
 1     y     8    93
 2     x     9    94
 3     y    10    99
 4     x     5    97
 5     y     2   100
 6     y     3    95
 7     x     7    96
 8     y     6    92
 9     x     4    91
10     x     1    98

> df_full_nested
# A tibble: 2 x 2
      a             full
  <chr>           <list>
1     y <tibble [5 x 2]>
2     x <tibble [5 x 2]>

> df_filter_nested
# A tibble: 2 x 2
      a         filtered
  <chr>           <list>
1     y <tibble [3 x 2]>
2     x <tibble [3 x 2]>

> df_nested
# A tibble: 2 x 3
      a             full         filtered
  <chr>           <list>           <list>
1     y <tibble [5 x 2]> <tibble [4 x 2]>
2     x <tibble [5 x 2]> <tibble [4 x 2]>

So, this works. But it is not clean. And in real life, I group by several columns, which means I also have to join on several columns... It gets hairy fast.

I'm wondering if there is a way to apply filter to the nested column. This way, I'd operate within the same object. Just cleaner and more understandable code.

I'm thinking it'd look like

df_full_nested %>% mutate(filtered = map(full, ...))

But I am not sure how to map filter() properly

Thanks!

480

asked Nov 07 '17 19:11

Taraas

1 Answers

You can use map(full, ~ filter(., c >= 95)), where . stands for individual nested tibble, to which you can apply the filter directly:

df_nested_2 <- df_full_nested %>% mutate(filtered = map(full, ~ filter(., c >= 95)))

identical(df_nested, df_nested_2)
# [1] TRUE

133

answered Sep 28 '22 11:09

Psidom

Related questions
                            
                                Plot data over background image with ggplot
                            
                                Convert named vector to dataframe
                            
                                Behavior of summing !is.na() results
                            
                                sum multiple columns by group with tapply
                            
                                R break corpus into sentences
                            
                                Lubridate week() to find consecutive week number for multi-year periods
                            
                                R - Transform Data frame to Time Series [duplicate]
                            
                                Why is R reading UTF-8 header as text?
                            
                                Read shape file with readOGR verses readShapePoly
                            
                                Voronoi diagram polygons enclosed in geographic borders
                            
                                Arrange base plots and grid.tables on the same page
                            
                                RMarkdown ioslides presentation in HD
                            
                                rollmean with dplyr and magrittr
                            
                                Can I reduce pdf file size in knitR/ggplot2 when using a large dataset without using external tools?
                            
                                How to get google search results
                            
                                Treat arithmetic operators as functions
                            
                                How can I use the "dplyr" chain operator %>% get the left side itself in R? [duplicate]
                            
                                Kill all R processes that hang for longer than a minute
                            
                                Disable Auto completion in R studio
                            
                                UseMethod("predict") : no applicable method for 'predict' applied to an object of class "train"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With