How does RAND() works in BigQuery?

I am trying to find the best sampling practise in BigQuery. My dataset is quite big (11B rows), but the distribution tends to be skewed. So far I've been exploring these two options:

  1. HASHING - where I take the hash of a certain value to select the sample. This is pretty straightforward approach and the mechanics behind it are clear. My question is about the second option:
  2. using RAND() function. I understand how to use it by looking at the BigQuery reference here: https://cloud.google.com/bigquery/docs/reference/legacy-sql#rand However, I have no idea how exactly is this function working.

Can anyone shed some more light on the background stuff that are happening there?

Thanks a lot, Gallory

My answer will apply to BigQuery Standard SQL. RAND() function generates a pseudo-random value of type FLOAT64 in the range of [0, 1), inclusive of 0 and exclusive of 1. The way you would use it for sampling is similar to how you would use FARM_FINGERPRINT function, but you don't need to specify any existing key. RAND() provides uniform distribution, so if some columns have skew, same skew is expected in the sample. Example of sampling 10% of the data in the table:

