generate random integers between two values with a given probability using R

Tags:

r

I have the following four number sets:

A=[1,207];
B=[208,386];
C=[387,486];
D=[487,586].

I need to generate 20000 random numbers between 1 and 586 in which the probability that the generated number belongs to A is 1/2 and to B,C,D is 1/6.

in which way I can do this using R?

685

asked Jun 08 '13 17:06

2 Answers

You can directly use sample, more specifcally the probs argument. Just divide the probability over all the 586 numbers. Category A get's 0.5/207 weight each, etc.

A <- 1:207
B <- 208:386
C <- 387:486
D <- 487:586
L <- sapply(list(A, B, C, D), length)

x <- sample(c(A, B, C, D),
            size = 20000,
            prob = rep(c(1/2, 1/6, 1/6, 1/6) / L, L),
            replace = TRUE)

118

answered Oct 23 '22 15:10

I would say use the Roulette selection method. I will try to give a brief explanation here. Take a line of say length 1 unit. Now break this in proportion of the probability values. So in our case, first piece will be of 1.2 length and next three pieces will be of 1/6 length. Now sample a number between 0,1 from uniform distribution. As all the number have same probability of occurring, a sampled number belonging to a piece will be equal to length of the piece. Hence which ever piece the number belongs too, sample from that vector. (I will give you the R code below you can run it for a huge number to check if what I am saying is true. I might not be doing a good job of explaining it here.)

It is called Roulette selection because another analogy for the same situation can be, take a circle and split it into sectors where the angle of each sector is proportional to the probability values. Now sample a number again from uniform distribution and see which sector it falls in and sample from that vector with the same probability

A <- 1:207
B <- 208:386
C <- 387:486
D <- 487:586

cumList <- list(A,B,C,D)

probVec <- c(1/2,1/6,1/6,1/6)

cumProbVec <- cumsum(probVec)

ret <- NULL

for( i in 1:20000){

  rand <- runif(1)

  whichVec <- which(rand < cumProbVec)[1] 

  ret <- c(ret,sample(cumList[[whichVec]],1))

}

#Testing the results

length(which(ret %in% A)) # Almost 1/2*20000 of the values

length(which(ret %in% B)) # Almost 1/6*20000 of the values

length(which(ret %in% C)) # Almost 1/6*20000 of the values

length(which(ret %in% D)) # Almost 1/6*20000 of the values

answered Oct 23 '22 13:10

Avinash

Related questions
                            
                                how to produce a sweave document without angle bracket ">" in front of code chunks?
                            
                                Why does R say no loop for break/next, jumping to top level
                            
                                How to iterate through hash items, in an R environment?
                            
                                R put multiple randomForest objects into a vector
                            
                                Split data by year
                            
                                How to search an environment using ls() inside a function?
                            
                                Appending % sign in output of prop.table
                            
                                Subtract shifted vectors in R
                            
                                creating columns within a legend list while using ggplot in R code
                            
                                Plot mean and sd of dataset per x value using ggplot2
                            
                                How to include object in regular expression
                            
                                sapply paste before at beginning of string
                            
                                Control 'base' point size in ggplot aes(size)
                            
                                Converting two columns of date and time data to one
                            
                                How can I use functions returning vectors (like fivenum) with ddply or aggregate?
                            
                                How to write a loop to run the t-test of a data frame?
                            
                                General issues regarding a plot
                            
                                Keep column name when filtering matrix columns
                            
                                Aggregate data frame by date and apply different functions to corresponding columns?
                            
                                ggplot2: Plotting on a grid with fewer plots than viewports

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

generate random integers between two values with a given probability using R

Tags:

random

r

dp.carlo

People also ask

2 Answers

Paul Hiemstra

Avinash

Recent Activity

Donate For Us