I would like to use mutate to calculate a column using the binomial distribution.
I have the following example:
library("dplyr")
d = data.frame(ref = rbinom(100,100,0.5))
d$coverage = 100
d$prob = 0.5
d$eprob= d$ref / d$coverage
d = tbl_df(d)
mutate(d,
ref1= ref,
cov1 = coverage,
eprob1 = eprob,
ref2=rbinom(1, coverage, eprob),
ref3=rbinom(1, cov1, eprob1)
)
Result is like this:
Source: local data frame [100 x 9]
ref coverage prob eprob ref1 cov1 eprob1 ref2 ref3
1 52 100 0.5 0.52 52 100 0.52 45 44
2 50 100 0.5 0.50 50 100 0.50 45 44
3 45 100 0.5 0.45 45 100 0.45 45 44
4 45 100 0.5 0.45 45 100 0.45 45 44
5 47 100 0.5 0.47 47 100 0.47 45 44
6 46 100 0.5 0.46 46 100 0.46 45 44
7 50 100 0.5 0.50 50 100 0.50 45 44
8 53 100 0.5 0.53 53 100 0.53 45 44
9 44 100 0.5 0.44 44 100 0.44 45 44
10 56 100 0.5 0.56 56 100 0.56 45 44
I don't get it - I want the mutate function to return a random number drawn from the binomial distribution given by ref and coverage (the "ref2")...
Mutate read the columns correctly - but something weird happens when calling rbinom...
Any help i appreciated.
Try changing the n
of rbinom
:
mutate(d,
ref1= ref,
cov1 = coverage,
eprob1 = eprob,
ref2=rbinom(100, coverage, eprob),
ref3=rbinom(100, cov1, eprob1)
)
Or more generally:
mutate(d,
ref1= ref,
cov1 = coverage,
eprob1 = eprob,
ref2=rbinom(n(), coverage, eprob),
ref3=rbinom(n(), cov1, eprob1)
)
Another solution would be :
d %>% rowwise() %>%
mutate(ref1= ref,
cov1 = coverage,
eprob1 = eprob,
ref2=rbinom(1, coverage, eprob),
ref3=rbinom(1, cov1, eprob1))
Where the rowwise()
command groups by (each) row and specifies that you need 1 random value for each row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With