Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Map function to second level of nested list using purrr

I'm trying to better understand functional programming in R. I'd like to stick to purrr, but I'll use rapply to demonstrate what I'm looking for below. First, a simple example of what I'm trying to understand:

You can use map to get the mean of each column of the mtcars dataset:

library(tidyverse)
mtcars %>% map_dbl(mean)

   mpg        cyl       disp         hp       drat         wt       qsec  
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
    vs         am       gear       carb 
 0.437500   0.406250   3.687500   2.812500 

But how would I use purrr to map mean to mtcars split by cyl?

library(tidyverse)
mtcars_split <- mtcars %>% split(.$cyl) 
mtcars_split %>% map(mean)
$`4`
[1] NA

$`6`
[1] NA

$`8`
[1] NA

Warning messages:
1: In mean.default(.x[[i]], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(.x[[i]], ...) :
  argument is not numeric or logical: returning NA
3: In mean.default(.x[[i]], ...) :
  argument is not numeric or logical: returning NA

I understand why this doesn't work: split creates a list and now I'm trying to map mean to each element of that new list, which are data.frames. This attempt at mapping is equivalent to (correct me if necessary):

mean(mtcars_split[1])
mean(mtcars_split[2])
mean(mtcars_split[3])

which obviously doesn't work - you can't just take the mean of a data.frame. What I really want is something that does this:

mtcars_split[[1]] %>% map(mean)
mtcars_split[[2]] %>% map(mean)
mtcars_split[[3]] %>% map(mean)

The problem is, I just can't wrap my head around how to do this in purrr. While looking for the solution to this (seemingly very basic) problem, I found rapply, which seems to do what I want, but outside of purrr (and the output isn't exactly in the format I'd like, but that's beside the point):

rapply(mtcars_split, mean, how = "unlist")
      4.mpg       4.cyl      4.disp        4.hp      4.drat        4.wt 
 26.6636364   4.0000000 105.1363636  82.6363636   4.0709091   2.2857273 
     4.qsec        4.vs        4.am      4.gear      4.carb       6.mpg 
 19.1372727   0.9090909   0.7272727   4.0909091   1.5454545  19.7428571 
  6.cyl      6.disp        6.hp      6.drat        6.wt      6.qsec 
  6.0000000 183.3142857 122.2857143   3.5857143   3.1171429  17.9771429 
       6.vs        6.am      6.gear      6.carb       8.mpg       8.cyl 
  0.5714286   0.4285714   3.8571429   3.4285714  15.1000000   8.0000000 
     8.disp        8.hp      8.drat        8.wt      8.qsec        8.vs 
353.1000000 209.2142857   3.2292857   3.9992143  16.7721429   0.0000000 
       8.am      8.gear      8.carb 
  0.1428571   3.2857143   3.5000000 

rapply being recursive apply is obviously a key to my answer - I believe I need nested maps - one to extract each column of the three data.frames in my mtcars_split, then one to run mean on each extracted column. However, I haven't been able to make that work.

I think this is addressed by Jenny Bryan in her purrr tutorial where she uses a map() inside a map(), but I can't follow what she is doing. She notes that the example might not be explained adequately earlier in the tutorial and I've asked her for elaboration here, but no answer yet (I know she is busy!).

like image 672
twgardner2 Avatar asked Sep 24 '17 16:09

twgardner2


1 Answers

The recipe for this kind of problem is always the same:

Decompose the problem, solve it for an individual case, and then put it back together inside out.

As you observed, mtcars %>% split(.$cyl) gives you a list of lists (list of data.frames). You want to map mean over the inner lists.

So let’s do it for one list first:

mtcars_split[[1]] %>% map_dbl(mean)
# Or, equivalently:
map_dbl(mtcars_split[[1]], mean)

This works. We’ve decomposed the problem and successfully solved it for an individual case: In other words, given a list x and a transformation f, we’ve solved the problem for x[[1]] by executing f(x[[1]]) (which is equivalent to x[[1]] %>% f()).

Time to generalise it to all cases. And we already know how to generalise a transformation of an element x[[1]] to a whole list x: use map on that list:

x %>% map(~ .x %>% f())
# or, equivalently:
x %>% map(~ f(.x))
# or, equivalently:
map(x, ~ f(.x))
# or, finally:
map(x, f)

Let’s do the exact same thing, with x and f substituted by mtcars_split and map_dbl(mean), respectively:

mtcars_split %>% map(~ .x %>% map_dbl(mean))
# or, equivalently:
mtcars_split %>% map(~ map_dbl(.x, mean))

And this can be simplified the same way as our example above:

mtcars_split %>% map(map_dbl, mean)
like image 106
Konrad Rudolph Avatar answered Oct 19 '22 00:10

Konrad Rudolph