Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate a Bernoulli variable from vector with probabilities [r]

I'm having some issues with a quite basic issue. I tried to find any threads who is having the same issue but couldn't find any.

I'm trying to figure out how to generate a Bernoulli variable (y) which is based on probabilities (z) I have generated for each observation. I've generated the fictive dataset below to represent my problem.

x <- c("A", "B", "C", "D", "E", "F")
z <- c(0.11, 0.23, 0.25, 0.06, 0.1, 0.032)

df <- data.frame(x, z)

I want to add the variable y which is a binary variable based upon the probabilities from variable z.

I tried the following:

df <- df %>%
  mutate(y = rbinom(1,1,z))

But it seems like it gives the same value to all observation, and not based on the observation's own probability.

Does anyone know how to solve this?

Thanks!

like image 768
ecl Avatar asked Feb 18 '26 02:02

ecl


1 Answers

From the online documentation for rbinom:

rbinom(n, size, prob)
n: number of observations. If length(n) > 1, the length is taken to be the number required.

So

df <- df %>%
  mutate(y = rbinom(nrow(df), 1, z))
df
> df
  x     z y
1 A 0.110 0
2 B 0.230 1
3 C 0.250 0
4 D 0.060 0
5 E 0.100 0
6 F 0.032 0

To demonstrate that events are generated with the correct probabilities:

df <- data.frame(x=rep(x, each=500), z=rep(z, each=500))
df <- df %>%
  mutate(y = rbinom(nrow(df), 1, z))
df %>% group_by(x) %>% summarise(y=mean(y), groups="drop")
# A tibble: 6 x 2
  x         y
  <fct> <dbl>
1 A     0.114
2 B     0.232
3 C     0.25 
4 D     0.06 
5 E     0.106
6 F     0.018
like image 169
Limey Avatar answered Feb 19 '26 15:02

Limey