Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr mutate using rbinom do not return random numbers

Tags:

r

dplyr

I would like to use mutate to calculate a column using the binomial distribution.

I have the following example:

library("dplyr")

d = data.frame(ref = rbinom(100,100,0.5))
d$coverage = 100
d$prob = 0.5
d$eprob= d$ref / d$coverage
d = tbl_df(d)

mutate(d,
       ref1= ref,
       cov1 = coverage,
       eprob1 = eprob,
       ref2=rbinom(1, coverage, eprob),
       ref3=rbinom(1, cov1, eprob1)
       )

Result is like this:

Source: local data frame [100 x 9]

   ref coverage prob eprob ref1 cov1 eprob1 ref2 ref3
1   52      100  0.5  0.52   52  100   0.52   45   44
2   50      100  0.5  0.50   50  100   0.50   45   44
3   45      100  0.5  0.45   45  100   0.45   45   44
4   45      100  0.5  0.45   45  100   0.45   45   44
5   47      100  0.5  0.47   47  100   0.47   45   44
6   46      100  0.5  0.46   46  100   0.46   45   44
7   50      100  0.5  0.50   50  100   0.50   45   44
8   53      100  0.5  0.53   53  100   0.53   45   44
9   44      100  0.5  0.44   44  100   0.44   45   44
10  56      100  0.5  0.56   56  100   0.56   45   44

I don't get it - I want the mutate function to return a random number drawn from the binomial distribution given by ref and coverage (the "ref2")...

Mutate read the columns correctly - but something weird happens when calling rbinom...

Any help i appreciated.

like image 466
pallevillesen Avatar asked Aug 07 '15 13:08

pallevillesen


2 Answers

Try changing the n of rbinom:

mutate(d,
   ref1= ref,
   cov1 = coverage,
   eprob1 = eprob,
   ref2=rbinom(100, coverage, eprob),
   ref3=rbinom(100, cov1, eprob1)
)

Or more generally:

mutate(d,
   ref1= ref,
   cov1 = coverage,
   eprob1 = eprob,
   ref2=rbinom(n(), coverage, eprob),
   ref3=rbinom(n(), cov1, eprob1)
)
like image 191
Alex Avatar answered Nov 17 '22 06:11

Alex


Another solution would be :

d %>% rowwise() %>%
      mutate(ref1= ref,
             cov1 = coverage,
             eprob1 = eprob,
             ref2=rbinom(1, coverage, eprob),
             ref3=rbinom(1, cov1, eprob1))

Where the rowwise() command groups by (each) row and specifies that you need 1 random value for each row.

like image 30
AntoniosK Avatar answered Nov 17 '22 05:11

AntoniosK