I was responding to question posed over at Reddit AskScience and I came across something odd with respect to the functionality of runif()
. I was attempting to sample a set from 1 to 52 uniformly. My first thought was to use runif():
as.integer(runif(n, min = 1, max = 52))
However, I found that the operation never produced a value of 52. For example:
length(unique(as.integer(runif(1000000, 1, 52))))
[1] 51
For my purposes, I just turned to sample()
instead:
sample(52, n, replace = TRUE)
In the runif() documentation it states:
runif will not generate either of the extreme values unless max = min or max-min is small compared to min, and in particular not for the default arguments.
I'm wondering why runif()
acts this way. It seems like it should be able to produce the 'extreme values' from the set if its attempting to generate samples uniformly. Is this a feature, and why?
The runif() function generates random deviates of the uniform distribution and is written as runif(n, min = 0, max = 1) . We may easily generate n number of random samples within any interval, defined by the min and the max argument.
To generate random numbers from a uniform distribution you can use the runif() function. Alternatively, you can use sample() to take a random sample using with or without replacements.
runif can be used to produce random numbers; runif does not stand for run if. runif(n) generates n uniform random numbers between 0 and 1. runif(n, a, b) generates n uniform random numbers between a and b .
dunif() function in R Language is used to provide the density of the distribution function.
This is indeed a feature. The C source code of runif
contains the following C code:
/* This is true of all builtin generators, but protect against
user-supplied ones */
do {u = unif_rand();} while (u <= 0 || u >= 1);
return a + (b - a) * u;
this implies that unif_rand()
could return 0 or 1, but runif()
is engineered to skip those (unlikely) cases.
My guess would be that this is done to protect user code that would fail in the edge cases (values exactly on the boundaries of the range).
This feature was implemented by Brian Ripley on Sep 19 2006 (from the comments it seems that 0<u<1
is automatically true of the built-in uniform generator, but might not be true for user-supplied ones).
sample(1:52,size=n,replace=TRUE)
is an idiomatic (although not necessarily the most efficient) way to achieve your goal.
as.integer
works like trunc
. It will form an integer by truncating the given value toward 0. And since values can't exceed 52 (see Ben's answer) they will always be truncated to a value between 1 and 51.
You would see different result with floor
(or ceiling
). Note that you have to adjust the max
of runif
by adding 1
(or adjust min
in case of ceiling
). Also note that in this case, since both min
and max
are above 0, you could replace floor
with trunc
or as.integer
too.
set.seed(42)
x = floor(runif(n = 1000000, min = 1, max = 52 + 1))
plot(prop.table(table(x)), las = 2, cex.axis = 0.75)
as.integer(51.999)
51
It is because how as.integer works.
If you want to draw from a discrete distribution, then use sample. runif
is not for discrete distributions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With