I'm trying to better understand functional programming in R. I'd like to stick to purrr, but I'll use rapply to demonstrate what I'm looking for below. First, a simple example of what I'm trying to understand:
You can use map to get the mean of each column of the mtcars dataset:
library(tidyverse)
mtcars %>% map_dbl(mean)
mpg cyl disp hp drat wt qsec
20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
vs am gear carb
0.437500 0.406250 3.687500 2.812500
But how would I use purrr to map mean to mtcars split by cyl?
library(tidyverse)
mtcars_split <- mtcars %>% split(.$cyl)
mtcars_split %>% map(mean)
$`4`
[1] NA
$`6`
[1] NA
$`8`
[1] NA
Warning messages:
1: In mean.default(.x[[i]], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(.x[[i]], ...) :
argument is not numeric or logical: returning NA
3: In mean.default(.x[[i]], ...) :
argument is not numeric or logical: returning NA
I understand why this doesn't work: split creates a list and now I'm trying to map mean to each element of that new list, which are data.frames. This attempt at mapping is equivalent to (correct me if necessary):
mean(mtcars_split[1])
mean(mtcars_split[2])
mean(mtcars_split[3])
which obviously doesn't work - you can't just take the mean of a data.frame. What I really want is something that does this:
mtcars_split[[1]] %>% map(mean)
mtcars_split[[2]] %>% map(mean)
mtcars_split[[3]] %>% map(mean)
The problem is, I just can't wrap my head around how to do this in purrr. While looking for the solution to this (seemingly very basic) problem, I found rapply, which seems to do what I want, but outside of purrr (and the output isn't exactly in the format I'd like, but that's beside the point):
rapply(mtcars_split, mean, how = "unlist")
4.mpg 4.cyl 4.disp 4.hp 4.drat 4.wt
26.6636364 4.0000000 105.1363636 82.6363636 4.0709091 2.2857273
4.qsec 4.vs 4.am 4.gear 4.carb 6.mpg
19.1372727 0.9090909 0.7272727 4.0909091 1.5454545 19.7428571
6.cyl 6.disp 6.hp 6.drat 6.wt 6.qsec
6.0000000 183.3142857 122.2857143 3.5857143 3.1171429 17.9771429
6.vs 6.am 6.gear 6.carb 8.mpg 8.cyl
0.5714286 0.4285714 3.8571429 3.4285714 15.1000000 8.0000000
8.disp 8.hp 8.drat 8.wt 8.qsec 8.vs
353.1000000 209.2142857 3.2292857 3.9992143 16.7721429 0.0000000
8.am 8.gear 8.carb
0.1428571 3.2857143 3.5000000
rapply being recursive apply is obviously a key to my answer - I believe I need nested maps - one to extract each column of the three data.frames in my mtcars_split, then one to run mean on each extracted column. However, I haven't been able to make that work.
I think this is addressed by Jenny Bryan in her purrr tutorial where she uses a map() inside a map(), but I can't follow what she is doing. She notes that the example might not be explained adequately earlier in the tutorial and I've asked her for elaboration here, but no answer yet (I know she is busy!).
The recipe for this kind of problem is always the same:
Decompose the problem, solve it for an individual case, and then put it back together inside out.
As you observed, mtcars %>% split(.$cyl) gives you a list of lists (list of data.frames). You want to map mean over the inner lists.
So let’s do it for one list first:
mtcars_split[[1]] %>% map_dbl(mean)
# Or, equivalently:
map_dbl(mtcars_split[[1]], mean)
This works. We’ve decomposed the problem and successfully solved it for an individual case: In other words, given a list x and a transformation f, we’ve solved the problem for x[[1]] by executing f(x[[1]]) (which is equivalent to x[[1]] %>% f()).
Time to generalise it to all cases. And we already know how to generalise a transformation of an element x[[1]] to a whole list x: use map on that list:
x %>% map(~ .x %>% f())
# or, equivalently:
x %>% map(~ f(.x))
# or, equivalently:
map(x, ~ f(.x))
# or, finally:
map(x, f)
Let’s do the exact same thing, with x and f substituted by mtcars_split and map_dbl(mean), respectively:
mtcars_split %>% map(~ .x %>% map_dbl(mean))
# or, equivalently:
mtcars_split %>% map(~ map_dbl(.x, mean))
And this can be simplified the same way as our example above:
mtcars_split %>% map(map_dbl, mean)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With