Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using purrr to map dplyr::select

Tags:

r

dplyr

purrr

I have a data frame with a bunch of nested data-frames within it, and I'd like to apply dplyr::select to each of those nested data frames. Here's an example

 library(tidyverse)

 mtcars %>%
 group_by(cyl) %>%
 nest %>%
 mutate(data2 = ~map(data, dplyr::select(.,-mpg)))

I would think that this would result in a data frame with three columns. cyl: the number of cylinders, data: the nested data, data2: the same as data except each element would not have the mpg column.

Instead R crashes:

 *** caught segfault ***
address 0x7ffc1e445000, cause 'memory not mapped'

Traceback:
 1: .Call(`_dplyr_mutate_impl`, df, dots)
 2: mutate_impl(.data, dots)
 3: mutate.tbl_df(., data2 = ~map(data, dplyr::select(., -mpg)))
 4: mutate(., data2 = ~map(data, dplyr::select(., -mpg)))
 5: function_list[[k]](value)
 6: withVisible(function_list[[k]](value))
 7: freduce(value, `_function_list`)
 8: `_fseq`(`_lhs`)
 9: eval(quote(`_fseq`(`_lhs`)), env, env)
10: eval(quote(`_fseq`(`_lhs`)), env, env)
11: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
12: mtcars %>% group_by(cyl) %>% nest %>% mutate(data2 = ~map(data,     dplyr::select(., -mpg)))

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

I realize I could get the columns I wanted if I apply the select operation before the nesting, but this would be less analogous with my real problem. Could somebody please explain to me what I am doing wrong here? Thanks for any advice.

like image 891
ohnoplus Avatar asked May 11 '18 18:05

ohnoplus


People also ask

What does Dplyr :: Select do in R?

The select() function of dplyr package is used to select variable names from the R data frame. Use this function if you wanted to select the data frame variables by index or position.

What does the map function from purrr do?

The map functions transform their input by applying a function to each element of a list or atomic vector and returning an object of the same length as the input. map() always returns a list. See the modify() family for versions that return an object of the same type as the input.

What does Map_df () do in R?

map_df() essentially does a bind_rows() and outputs a single dataframe, adding a new variable dist which takes the names of the elements of the list, outputting a long dataframe. Finally this is passed to ggplot() which creates histograms with geom_histogram() , and facets them into six panes with facet_wrap() .


1 Answers

You need to move ~ from map to select; or use the comment as @Russ; ~ is used when the function (in this case purrr::map) accepts a formula as argument:

mtcars %>%
    group_by(cyl) %>%
    nest %>%
    mutate(data2 = map(data, ~ select(., -mpg)))

# A tibble: 3 x 3
#    cyl data               data2            
#  <dbl> <list>             <list>           
#1     6 <tibble [7 × 10]>  <tibble [7 × 9]> 
#2     4 <tibble [11 × 10]> <tibble [11 × 9]>
#3     8 <tibble [14 × 10]> <tibble [14 × 9]>
like image 140
Psidom Avatar answered Sep 17 '22 20:09

Psidom