Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to model random variables?

I want to know how to model random variables using "basic operations". The only random function I know, at least for C, is rand(), along with srand for seeding. There probably exists packages somewhere online but lets say I want to implement it on my own. I don't know if there are other very common random functions, but if not, lets just stick with rand() and the C language.

rand() allows me to pseudo-randomly generate an int from 0 to RAND_MAX. I can then use mod to get an int in some range. I can next mod 2 to choose a sign and get negative numbers. I can also do rand()/RAND_MAX to model values in the interval (0,1) and shift this to model Uniform(a,b).

But what I am not sure about is if I can extend this to model any probability distribution and at what point do I have to worry about accuracy especially when dealing with infinities and irrational probabilities. Also, this method is very crude so I would like to know of more standard ways using basic tools if any.

A simple example:

I have the random variable X such that Pr(X = 1)=1/pi and Pr(X=0)=1-1/pi. Since pi is irrational, I would approximate the probability of getting 1/pi with rand() and choose X=1 if I get an int from 0 to Round(RAND_MAX*1/pi). So this is approximating twice, once for pi and another time for rounding.

Is there a better approach? How would one go about modeling something more complicated such as a continuous random variable on the interval (0,infinity) or a discrete random variable with irrational probabilities on a countably infinite set. Would my approach still work or would I have to worry about rounding errors?

EDIT: Also how does the pseudo-randomness instead of randomness of rand() change things and how would I account for these changes?

like image 958
domoremath Avatar asked Mar 09 '23 19:03

domoremath


1 Answers

I can then use mod to get an int in some range

No, you can't. Try it with dice. You want a number between 1 and 5. So you take the roll mod 5 (kind of, it would actually be ((roll-1)%5)+1). This maps 1 to 1, 2 to 2, etc. 5 to 5 and 6 to 1. You now have 1 twice as likely as any other roll.

The correct way of doing this is to find the nearest power of 2 higher than your range, mask out the bits of the random number above that power of 2 then check if you're in range. If you aren't in range, try again (will potentially loop forever, in practice the average number of retries is less than 2). This assumes that your random numbers are a stream of bits and not something else. This is usually a safe assumption for decent generators.

I can also do rand()/RAND_MAX to model values in the interval (0,1)

No, you can't. That's not how floating point numbers work. This generates a horrible distribution.

Either the number of bits in the integer is smaller than the number of bits in the mantissa, then you'll just have a bunch of floating point numbers you can't ever generate. Or the number of bits in the integer is bigger than the number of bits in the mantissa and then you'll truncate your integer when converting it to floating point before the division and will generate certain numbers much more often.

in the interval (0,1) and shift this to model Uniform(a,b).

This makes things even worse. First you lose bits in one direction, then you lose bits in the other direction.

To actually generate uniformly distributed floating point numbers in an arbitrary range is harder than it looks.

I've done some experiments to figure this out myself a few years ago, learning floating point internals in the process and I've written some code with a lot of comments with reasoning here: https://github.com/art4711/random-double

In short, to generate random floating point numbers in an arbitrary range: find the bigger absolute value of the range. That is the start, the other end of the range is the end. Figure out the next representable number from start to end. Subtract that next number from start, that becomes the step. Calculate how many steps exist between start and end. Generate a uniformly distributed random number between 0 and number of steps. start + step * random number is the answer. Also, because of how floating point work, this might not be exactly what you're looking for. All possible floating point values are most certainly not possible to generate using this method (except in very special cases). But this method guarantees that every possible value is equally likely.

Notice that your misconceptions are very common. Almost everyone does those things. Random numbers in the industry are anything but random. The word random in computer science pretty much means "predictable, repeatable, easily breakable and exploitable, quite possibly not well distributed". And don't get me started on the quality of the "random" number generators in standard libraries. If you dig around my github stuff, you'll find a package for Go with a long README rant about this.

I'm not going to respond to the rest of your question, those bits require a book or two.

like image 131
Art Avatar answered Mar 12 '23 08:03

Art