Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gaussian random distribution in Postgresql

I have a table of let's say 250 URLs :

create table url (
  id serial,
  url varchar(64)
)

These URLs correspond each to a website. Each of the websites have a different popularity. Let's say that the id=125 (the one centered on the gaussian) is the most popular, the ones at id=1 or id=250 are the least popular.

I want to populate a "log" table like the following one with a value of url among the ones provided in the "url" table, but taking into account that different URLS might appear more frequently (for exemple url whose id is 125 will be the most popular).

create table log (
  id serial,
  url_id integer
)

I want to avoid using random() since it is uniform and not very "real".

How can this be achieved with Postgresql ?

like image 300
SCO Avatar asked Feb 24 '12 13:02

SCO


1 Answers

The sum of 12 uniform distributions on the range [0, 1) is a good approximation to a Gaussian distribution bounded in the range [0, 12). This can then easily be re-scaled by multiplying by a constant and then adding/subtracting a constant.

select
    random() + 
    random() + 
    random() +
    random() + 
    random() + 
    random() +
    random() + 
    random() + 
    random() +
    random() + 
    random() + 
    random();

http://books.google.com/books?id=EKA-yeX2GVgC&pg=PA185&lpg=PA185&dq=%22sum+of+12+uniform+random+variables%22&source=bl&ots=YfwwE0fBB3&sig=HX9J9Oe6x316kVL8uamDU_GOsn4&hl=en&sa=X&ei=bJLZUur1GozaqwGHm4DQDQ&ved=0CEUQ6AEwAw#v=onepage&q=%22sum%20of%2012%20uniform%20random%20variables%22&f=false

like image 175
user2179977 Avatar answered Sep 16 '22 17:09

user2179977