I have the following dataframe:
col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov",
"chi","avi","chi","chi","bov","bov","fox","avi","bov","chi")
col2 <- c("low","med","high","high","low","low","med","med","med","high",
"low","low","high","high","med","med","low","low","med")
col3 <- c(0,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0)
test_data <- cbind(col1, col2, col3)
test_data <- as.data.frame(test_data)
And I want to end up with something like this table (values are random):
Species Pop.density %Resistance CI_low CI_high Total samples
avi low 2.0 1.2 2.2 30
avi med 0 0 0.5 20
avi high 3.5 2.9 4.2 10
chi low 0.5 0.3 0.7 20
chi med 2.0 1.9 2.1 150
chi high 6.5 6.2 6.6 175
The % resistance column is based on the col3 above, where 1 = resistant, and 0 = nonresistant. I have tried the following:
library(dplyr)
test_data<-test_data %>%
count(col1,col2,col3) %>%
group_by(col1, col2) %>%
mutate(perc_res = prop.table(n)*100)
I tried this, and it seem to almost do the trick, as i get the percentage of total 1s and 0s in col3, for each value in col1 and 2, however the total samples is wrong since I am counting all three columns, when the correct count would be for only col1 and 2.
For the confidence interval i would use the following:
binom.test(resistant samples,total samples)$conf.int*100
However I am not sure how to implement it together with the rest. Is there a simple and quick way to do this?
I'd do...
library(data.table)
setDT(DT)
DT[, {
bt <- binom.test(sum(resists), .N)$conf.int*100
.(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)
}, keyby=.(species, popdens)]
species popdens res_rate res_lo res_hi n
1: avi low 0.00000 0.000000 70.75982 3
2: avi med 0.00000 0.000000 97.50000 1
3: bov low 100.00000 15.811388 100.00000 2
4: bov med 50.00000 1.257912 98.74209 2
5: bov high 100.00000 15.811388 100.00000 2
6: chi low 0.00000 0.000000 97.50000 1
7: chi med 50.00000 1.257912 98.74209 2
8: chi high 66.66667 9.429932 99.15962 3
9: fox low 0.00000 0.000000 97.50000 1
10: fox med 50.00000 1.257912 98.74209 2
To include all levels (combinations of species and population density)...
DT[CJ(species = species, popdens = popdens, unique = TRUE), on=.(species, popdens), {
bt <-
if (.N > 0L) binom.test(sum(resists), .N)$conf.int*100
else NA_real_
.(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)
}, by=.EACHI]
species popdens res_rate res_lo res_hi n
1: avi low 0.00000 0.000000 70.75982 3
2: avi med 0.00000 0.000000 97.50000 1
3: avi high NA NA NA 0
4: bov low 100.00000 15.811388 100.00000 2
5: bov med 50.00000 1.257912 98.74209 2
6: bov high 100.00000 15.811388 100.00000 2
7: chi low 0.00000 0.000000 97.50000 1
8: chi med 50.00000 1.257912 98.74209 2
9: chi high 66.66667 9.429932 99.15962 3
10: fox low 0.00000 0.000000 97.50000 1
11: fox med 50.00000 1.257912 98.74209 2
12: fox high NA NA NA 0
How it works
The syntax is DT[i, j, by=]
where ...
i
determines a subset of rows, sometimes with helper arguments, on=
or roll=
.by=
determines groups within the subsetted table, switched to keyby=
if sorting.j
is code acting on each group. j
should evaluate to a list, with .()
being a shortcut to list()
. See ?data.table
for details.
Data used
(renamed columns, reformatted binary variable back to 0/1 or false/true, set levels of population density in the right order):
DT = data.frame(
species = col1,
popdens = factor(col2, levels=c("low", "med", "high")),
resists = col3
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With