Netezza says that its random() function generates a float between 0.0 and 0.99999... I'm trying to generate random ints within a range (e.g., between 5 and 10). The internet told me to use an equation like this to scale a fraction to a number in a range:
select f,count(*) from (
select CAST(5 + random() * (10-5) as INT) as f
from table_of_numbers
where number between 1 and 5000
) x group by 1 order by 1
However, when I use that code, the extreme values are under-represented in the sample:
F COUNT
5 486 <---
6 992
7 1057
8 1000
9 937
10 528 <---
Does anyone know how I can fix this?
Thanks!
The expression random()*(10-5)
produces numbers from 0 to 4.99999. However, you have a range of 6 values (5, 6, 7, 8, 9, and 10). So your expression is splitting 5 values across 6 buckets.
Your code is finding that the first and last are half full. Apparently, the cast()
operation is rounding the values rather than truncating them (I don't think this is ANSI SQL, but it would explain your observed results). This masks the problem.
Try this:
select CAST(4.5 + random() * (10-5+1) as INT)
In other databases, something like this should work:
select CAST(5 + random() * (10-5+1) as INT)
This slightly modified query produced an even distribution of random numbers from 5 to 10 (inclusively) on a big table (~70m rows) on my environment:
select f, count(*) from (
select 5 + floor(random() * 5.99)::int as f
from some_big_table
) x group by 1 order by 1;
F COUNT
5 11659920
6 11663534
7 11665070
8 11668845
9 11665256
10 11549193
The key here is to tell the system to round the random number down to the nearest integer value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With