I'm trying to better understand functional programming in R. I'd like to stick to purrr
, but I'll use rapply
to demonstrate what I'm looking for below. First, a simple example of what I'm trying to understand:
You can use map
to get the mean of each column of the mtcars
dataset:
library(tidyverse)
mtcars %>% map_dbl(mean)
mpg cyl disp hp drat wt qsec
20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
vs am gear carb
0.437500 0.406250 3.687500 2.812500
But how would I use purrr
to map mean
to mtcars
split by cyl
?
library(tidyverse)
mtcars_split <- mtcars %>% split(.$cyl)
mtcars_split %>% map(mean)
$`4`
[1] NA
$`6`
[1] NA
$`8`
[1] NA
Warning messages:
1: In mean.default(.x[[i]], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(.x[[i]], ...) :
argument is not numeric or logical: returning NA
3: In mean.default(.x[[i]], ...) :
argument is not numeric or logical: returning NA
I understand why this doesn't work: split
creates a list and now I'm trying to map
mean
to each element of that new list, which are data.frame
s. This attempt at map
ping is equivalent to (correct me if necessary):
mean(mtcars_split[1])
mean(mtcars_split[2])
mean(mtcars_split[3])
which obviously doesn't work - you can't just take the mean
of a data.frame
. What I really want is something that does this:
mtcars_split[[1]] %>% map(mean)
mtcars_split[[2]] %>% map(mean)
mtcars_split[[3]] %>% map(mean)
The problem is, I just can't wrap my head around how to do this in purrr
. While looking for the solution to this (seemingly very basic) problem, I found rapply
, which seems to do what I want, but outside of purrr
(and the output isn't exactly in the format I'd like, but that's beside the point):
rapply(mtcars_split, mean, how = "unlist")
4.mpg 4.cyl 4.disp 4.hp 4.drat 4.wt
26.6636364 4.0000000 105.1363636 82.6363636 4.0709091 2.2857273
4.qsec 4.vs 4.am 4.gear 4.carb 6.mpg
19.1372727 0.9090909 0.7272727 4.0909091 1.5454545 19.7428571
6.cyl 6.disp 6.hp 6.drat 6.wt 6.qsec
6.0000000 183.3142857 122.2857143 3.5857143 3.1171429 17.9771429
6.vs 6.am 6.gear 6.carb 8.mpg 8.cyl
0.5714286 0.4285714 3.8571429 3.4285714 15.1000000 8.0000000
8.disp 8.hp 8.drat 8.wt 8.qsec 8.vs
353.1000000 209.2142857 3.2292857 3.9992143 16.7721429 0.0000000
8.am 8.gear 8.carb
0.1428571 3.2857143 3.5000000
rapply
being recursive apply
is obviously a key to my answer - I believe I need nested map
s - one to extract each column of the three data.frame
s in my mtcars_split
, then one to run mean
on each extracted column. However, I haven't been able to make that work.
I think this is addressed by Jenny Bryan in her purrr
tutorial where she uses a map()
inside a map()
, but I can't follow what she is doing. She notes that the example might not be explained adequately earlier in the tutorial and I've asked her for elaboration here, but no answer yet (I know she is busy!).
The recipe for this kind of problem is always the same:
Decompose the problem, solve it for an individual case, and then put it back together inside out.
As you observed, mtcars %>% split(.$cyl)
gives you a list of lists (list of data.frames). You want to map mean
over the inner lists.
So let’s do it for one list first:
mtcars_split[[1]] %>% map_dbl(mean)
# Or, equivalently:
map_dbl(mtcars_split[[1]], mean)
This works. We’ve decomposed the problem and successfully solved it for an individual case: In other words, given a list x
and a transformation f
, we’ve solved the problem for x[[1]]
by executing f(x[[1]])
(which is equivalent to x[[1]] %>% f()
).
Time to generalise it to all cases. And we already know how to generalise a transformation of an element x[[1]]
to a whole list x
: use map
on that list:
x %>% map(~ .x %>% f())
# or, equivalently:
x %>% map(~ f(.x))
# or, equivalently:
map(x, ~ f(.x))
# or, finally:
map(x, f)
Let’s do the exact same thing, with x
and f
substituted by mtcars_split
and map_dbl(mean)
, respectively:
mtcars_split %>% map(~ .x %>% map_dbl(mean))
# or, equivalently:
mtcars_split %>% map(~ map_dbl(.x, mean))
And this can be simplified the same way as our example above:
mtcars_split %>% map(map_dbl, mean)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With