Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Supplying the list argument to .l in pmap() or pwalk()

Tags:

r

purrr

I am unclear as to when arguments can be explicitly paired in pmap() and pwalk()'s .l argument. Sometimes these purrr functions only seem to work when the dataframe supplied to them has names that map directly to the expected arguments of the function named in .f. Other times a full dataframe can be supplied to pmap() and the variables can be pair mapped explicitly.

library(dplyr)
library(purrr)
library(tibble)

set.seed(57)

ds_mt <- 
  mtcars %>% 
  rownames_to_column("model") %>% 
  mutate(am = factor(am, labels = c("auto", "manual"))) %>% 
  select(model, mpg, wt, cyl, am) %>% 
  sample_n(3)

foo <- function(model, am, mpg){
  print(
    paste("The", model, "has a", am, "transmission and gets", mpg, "mpgs.")
  )
}

Why do these code chunks work?

ds_mt %>% 
  select(model, am, mpg) %>% 
  pwalk(
  .l = .,
  .f = foo
)

# example with explicit pair mapping
ds_mt %>% 
  mutate(
    new_var = 
      pmap(
        .l = list(model=model, am=am, mpg=mpg),
        .f = foo
      )
  )

While these code chunks fail?

ds_mt %>% 
  pwalk(
  .l = list(model, am, mpg),
  .f = foo
)

ds_mt %>% 
  pwalk(
  .l = list(model=model, am=am, mpg=mpg),
  .f = foo
)
like image 279
Joe Avatar asked Sep 08 '25 03:09

Joe


1 Answers

Your problem has nothing to do with pmap() or pwalk(). It stems from some misunderstanding of how the pipe and the mutate() function work.


First, the pipe:

Unless otherwise specified by a dot, the pipe passes the left-hand side (LHS) as the first argument of the function on the right-hande side (RHS).

So this works:

ds_mt %>% 
  select(model, am, mpg) %>% 
  pwalk(
    .l = .,
    .f = foo
  )

because your list (= your data frame since a data frame is a list of vectors), which is the LHS of the pipe, is used as the first argument of pwalk() on the RHS.

In this case, you actually do not need the dot and you could have written it much more simply as:

ds_mt %>% 
  select(model, am, mpg) %>% 
  pwalk(foo)

On the other hand, when you try to run:

ds_mt %>% 
  pwalk(
    .l = list(model, am, mpg),
    .f = foo
  )

the connection between your LHS and your RHS do not follow the rules of the pipe, so R has no idea what model is since you don't have any object called model.

For this expression to work, you can write it, without the pipe, in this manner:

pwalk(
  .l = list(ds_mt$model, ds_mt$am, ds_mt$mpg),
  .f = foo
)

Or, if you want to use the pipe, you have to replace the LHS of the pipe by dots (since it is not passed as the first argument of the function on the RHS) where it is necessary for the code to work, but here, since you are passing the LHS inside nested functions, you also have to put the RHS between curly braces because R would otherwise pass the LHS as the first argument of the outer-most function of the RHS:

ds_mt %>% {
  pwalk(
    .l = list(.$model, .$am, .$mpg),
    .f = foo
  )
}

or, in a style a little more compact:

ds_mt %>% {pwalk(list(.$model, .$am, .$mpg), foo)}

In conclusion, it is not enough to have an object on the LHS of a pipe for R to magically apply it at the right places of the RHS (but I think your confusion might come from the case of dplyr functions (see below)). By default, it is used as the first argument of the function on the RHS (and in that case, you don't need any dot). For other placements, you do need a dot at each place where the LHS is needed. And in the case of nested functions (as you have here), you also need to enclose the RHS in curly braces or R will pass the LHS as the first argument of your outer-most RHS function.


Now, to your mutate() example:

ds_mt %>% 
  mutate(
    new_var = pmap(
      .l = list(model, am, mpg),
      .f = foo
    )
  )

This one works because, with newer versions of dplyr, the data frame and dollar sign are not necessary anymore when calling variables inside the mutate() function. So here, R does not wonder what model is because you are in a "mutate framework", so to speak, and R understands model as meaning .$model or ds_mt$model. So here again, this has nothing to do with pmap() or pwalk() but is a particularity of the dplyr functions (it would be the same with summarise()). I guess this shortcut of notation that dplyr functions allow is what lead you to some confusions.


Finally, what you call "explicit pair mapping" has no effect. Since you defined your function foo() as accepting 3 arguments, as long as you keep the arguments in the right order,

foo(model = model, am = am, mpg = mpg)

and

foo(model, am, mpg)

are exactly the same. If you swap the arguments around however, you do need to be explicit. For instance:

foo(am = am, model = model, mpg = mpg)
like image 90
prosoitos Avatar answered Sep 09 '25 18:09

prosoitos