Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use custom functions in mutate (dplyr)?

Tags:

r

dplyr

I'm rewriting all my code using dplyr, and need help with mutate / mutate_at function. All I need is to apply custom function to two columns in my table. Ideally, I would reference these columns by their indices, but now I can't make it work even referencing by names.

The function is:

binom.test.p <- function(x) {
  if (is.na(x[1])|is.na(x[2])|(x[1]+x[2])<10) {
    return(NA)
  } 
  else {
    return(binom.test(x, alternative="two.sided")$p.value)
  }
} 

My data:

table <- data.frame(geneId=c("a", "b", "c", "d"), ref_SG1_E2_1_R1_Sum = c(10,20,10,15), alt_SG1_E2_1_R1_Sum = c(10,20,10,15))

So I do:

table %>%
  mutate(Ratio=binom.test.p(c(ref_SG1_E2_1_R1_Sum, alt_SG1_E2_1_R1_Sum)))
Error: incorrect length of 'x'

If I do:

table %>% 
mutate(Ratio=binom.test.p(ref_SG1_E2_1_R1_Sum, alt_SG1_E2_1_R1_Sum))
Error: unused argument (c(10, 20, 10, 15))

The second error is probably because my function needs one vector and gets two parameters instead.

But even forgetting about my function. This works:

table %>%
  mutate(sum = ref_SG1_E2_1_R1_Sum + alt_SG1_E2_1_R1_Sum)

This doesn't:

    table %>%
      mutate(.cols=c(2:3), .funs=funs(sum=sum(.)))
Error: wrong result size (2), expected 4 or 1

So it's probably my misunderstanding of how dplyr works.

like image 880
kintany Avatar asked Jun 23 '17 22:06

kintany


People also ask

How do I load a mutate function in R?

In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.

Can you use dplyr in a function?

dplyr functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like select(mtcars, mpg) , and why select(mtcars, "mpg") doesn't work. When you use dplyr in functions, you will likely want to use "standard evaluation".

What does mutate in dplyr do?

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name.


1 Answers

In many cases it's sufficient to create a vectorized version of the function:

your_function_V <- Vectorize(your_function)

The vectorized function is then usable in a dplyr's mutate. See also this blog post.

The function posted in the question however takes one two-dimensional input from two different columns. Therefore we need to modify this, so the inputs are individual, before we vectorize.

binom.test.p <- function(x, y) {
  # input x and y
  x <- c(x, y)
  
  if (is.na(x[1])|is.na(x[2])|(x[1]+x[2])<10) {
    return(NA)
  } 
  else {
    return(binom.test(x, alternative="two.sided")$p.value)
  }
} 

# vectorized function
binom.test.p_V <- Vectorize(binom.test.p)

table %>%
  mutate(Ratio = binom.test.p_V(ref_SG1_E2_1_R1_Sum, alt_SG1_E2_1_R1_Sum))

# works!
like image 74
Martin Avatar answered Sep 22 '22 23:09

Martin