Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr::mutate to add multiple values

Tags:

r

dplyr

There are a couple of issues about this on the dplyr Github repo already, and at least one related SO question, but none of them quite covers my question -- I think.

  • Adding multiple columns in a dplyr mutate call is more or less what I want, but there's a special-case answer for that case (tidyr::separate) that doesn't (I think) work for me.
  • This issue ("summarise or mutate with functions returning multiple values/columns") says "use do()".

Here's my use case: I want to compute exact binomial confidence intervals

dd <- data.frame(x=c(3,4),n=c(10,11)) get_binCI <- function(x,n) {     rbind(setNames(c(binom.test(x,n)$conf.int),c("lwr","upr"))) } with(dd[1,],get_binCI(x,n)) ##             lwr       upr ## [1,] 0.06673951 0.6524529 

I can get this done with do() but I wonder if there's a more expressive way to do this (it feels like mutate() could have a .n argument as is being discussed for summarise() ...)

library("dplyr") dd %>% group_by(x,n) %>%     do(cbind(.,get_binCI(.$x,.$n)))  ## Source: local data frame [2 x 4] ## Groups: x, n ##  ##   x  n        lwr       upr ## 1 3 10 0.06673951 0.6524529 ## 2 4 11 0.10926344 0.6920953 
like image 556
Ben Bolker Avatar asked Apr 13 '15 20:04

Ben Bolker


People also ask

What does mutate () do in R?

In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.

How do you add a new variable in mutate in R?

To create the new variable, we start with the data frame with the pipe operator and use mutate() function. Inside mutate() function, we specify the name of the new variable we are creating and how exactly we are creating.


2 Answers

Yet another variant, although I think we're all splitting hairs here.

> dd <- data.frame(x=c(3,4),n=c(10,11)) > get_binCI <- function(x,n) { +   as_data_frame(setNames(as.list(binom.test(x,n)$conf.int),c("lwr","upr"))) + } >  > dd %>%  +   group_by(x,n) %>% +   do(get_binCI(.$x,.$n)) Source: local data frame [2 x 4] Groups: x, n    x  n        lwr       upr 1 3 10 0.06673951 0.6524529 2 4 11 0.10926344 0.6920953 

Personally, if we're just going by readability, I find this preferable:

foo  <- function(x,n){     bi <- binom.test(x,n)$conf.int     data_frame(lwr = bi[1],                upr = bi[2]) }  dd %>%      group_by(x,n) %>%     do(foo(.$x,.$n)) 

...but now we're really splitting hairs.

like image 164
joran Avatar answered Oct 14 '22 16:10

joran


Yet another option could be to use the purrr::map family of functions.

If you replace rbind with dplyr::bind_rows in the get_binCI function:

library(tidyverse)  dd <- data.frame(x = c(3, 4), n = c(10, 11)) get_binCI <- function(x, n) {   bind_rows(setNames(c(binom.test(x, n)$conf.int), c("lwr", "upr"))) } 

You can use purrr::map2 with tidyr::unnest:

dd %>% mutate(result = map2(x, n, get_binCI)) %>% unnest()  #>   x  n        lwr       upr #> 1 3 10 0.06673951 0.6524529 #> 2 4 11 0.10926344 0.6920953 

Or purrr::map2_dfr with dplyr::bind_cols:

dd %>% bind_cols(map2_dfr(.$x, .$n, get_binCI))  #>   x  n        lwr       upr #> 1 3 10 0.06673951 0.6524529 #> 2 4 11 0.10926344 0.6920953 
like image 40
markdly Avatar answered Oct 14 '22 17:10

markdly