I'm rewriting all my code using dplyr, and need help with mutate / mutate_at function. All I need is to apply custom function to two columns in my table. Ideally, I would reference these columns by their indices, but now I can't make it work even referencing by names.
The function is:
binom.test.p <- function(x) {
if (is.na(x[1])|is.na(x[2])|(x[1]+x[2])<10) {
return(NA)
}
else {
return(binom.test(x, alternative="two.sided")$p.value)
}
}
My data:
table <- data.frame(geneId=c("a", "b", "c", "d"), ref_SG1_E2_1_R1_Sum = c(10,20,10,15), alt_SG1_E2_1_R1_Sum = c(10,20,10,15))
So I do:
table %>%
mutate(Ratio=binom.test.p(c(ref_SG1_E2_1_R1_Sum, alt_SG1_E2_1_R1_Sum)))
Error: incorrect length of 'x'
If I do:
table %>%
mutate(Ratio=binom.test.p(ref_SG1_E2_1_R1_Sum, alt_SG1_E2_1_R1_Sum))
Error: unused argument (c(10, 20, 10, 15))
The second error is probably because my function needs one vector and gets two parameters instead.
But even forgetting about my function. This works:
table %>%
mutate(sum = ref_SG1_E2_1_R1_Sum + alt_SG1_E2_1_R1_Sum)
This doesn't:
table %>%
mutate(.cols=c(2:3), .funs=funs(sum=sum(.)))
Error: wrong result size (2), expected 4 or 1
So it's probably my misunderstanding of how dplyr works.
In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.
dplyr functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like select(mtcars, mpg) , and why select(mtcars, "mpg") doesn't work. When you use dplyr in functions, you will likely want to use "standard evaluation".
mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name.
In many cases it's sufficient to create a vectorized version of the function:
your_function_V <- Vectorize(your_function)
The vectorized function is then usable in a dplyr's mutate
. See also this blog post.
The function posted in the question however takes one two-dimensional input from two different columns. Therefore we need to modify this, so the inputs are individual, before we vectorize.
binom.test.p <- function(x, y) {
# input x and y
x <- c(x, y)
if (is.na(x[1])|is.na(x[2])|(x[1]+x[2])<10) {
return(NA)
}
else {
return(binom.test(x, alternative="two.sided")$p.value)
}
}
# vectorized function
binom.test.p_V <- Vectorize(binom.test.p)
table %>%
mutate(Ratio = binom.test.p_V(ref_SG1_E2_1_R1_Sum, alt_SG1_E2_1_R1_Sum))
# works!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With