I'm trying to solve the following problem in R: I have a dataframe with two variables (number of successes, and number of total trials).
# A tibble: 4 x 2
Success N
<dbl> <dbl>
1 28. 40.
2 12. 40.
3 22. 40.
4 8. 40.
I would like to perform a prop.test or binom.test on each row and add the resulting list to the dataframe (or certain elements of it, like the p-value and CIs).
Ideally, I would like to add a third column with the p-values and the CI-range. My attempts so far were painly unsuccessful. Here is a minimal coding example:
Success <- c( 38, 12, 27, 9)
N <- c( 50, 50, 50, 50)
df <- as.tibble( cbind(Success, N))
df %>%
map( ~ prop.test, x = .$Success, n = .$N)
Doesn't give the desired result. Any help would be much appreciated.
Cheers,
Luise
We can use pmap
after changing the column names with the arguments of 'prop.test'
pmap(setNames(df, c("x", "n")), prop.test)
Or using map2
map2(df$Success, df$N, prop.test)
The problem with map
is that it is looping through each of the columns of the dataset and it is a list
of vector
s
df %>%
map(~ .x)
#$Success
#[1] 38 12 27 9
#$N
#[1] 50 50 50 50
So, we cannot do .x$Success
or .x$N
As @Steven Beaupre mentioned, if we need to create new columns with p-value and confidence interval
res <- df %>%
mutate(newcol = map2(Success, N, prop.test),
pval = map_dbl(newcol, ~ .x[["p.value"]]),
CI = map(newcol, ~ as.numeric(.x[["conf.int"]]))) %>%
select(-newcol)
# A tibble: 4 x 4
# Success N pval CI
# <dbl> <dbl> <dbl> <list>
#1 38.0 50.0 0.000407 <dbl [2]>
#2 12.0 50.0 0.000407 <dbl [2]>
#3 27.0 50.0 0.671 <dbl [2]>
#4 9.00 50.0 0.0000116 <dbl [2]>
The 'CI' column is a list
of 2 elements, which can be unnest
ed to make it a 'long' format data
res %>%
unnest
Or create 3 columns
df %>%
mutate(newcol = map2(Success, N, ~ prop.test(.x, n = .y) %>%
{tibble(pvalue = .[["p.value"]],
CI_lower = .[["conf.int"]][[1]],
CI_upper = .[["conf.int"]][[2]])})) %>%
unnest
# A tibble: 4 x 5
# Success N pvalue CI_lower CI_upper
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 38.0 50.0 0.000407 0.615 0.865
#2 12.0 50.0 0.000407 0.135 0.385
#3 27.0 50.0 0.671 0.395 0.679
#4 9.00 50.0 0.0000116 0.0905 0.319
If you want a new column, you'd use @akrun's approach but sprinkle in a little dplyr
and broom
amongst the purrr
library(tidyverse) # for dplyr, purrr, tidyr & co.
library(broom)
analysis <- df %>%
set_names(c("x","n")) %>%
mutate(result = pmap(., prop.test)) %>%
mutate(result = map(result, tidy))
From there that gives you the results in a tidy nested tibble. If you want to just limit that to certain variables, you'd just follow the mutate
/map
applying functions to the nested frame, then unnest().
analysis %>%
mutate(result = map(result, ~select(.x, p.value, conf.low, conf.high))) %>%
unnest(cols = c(result))
# A tibble: 4 x 5
x n p.value conf.low conf.high
<dbl> <dbl> <dbl> <dbl> <dbl>
1 38.0 50.0 0.000407 0.615 0.865
2 12.0 50.0 0.000407 0.135 0.385
3 27.0 50.0 0.671 0.395 0.679
4 9.00 50.0 0.0000116 0.0905 0.319
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With