How does RAND() works in BigQuery?

Q: How do you do random sampling in BigQuery?

If you want to sample individual rows, rather than data blocks, then you can use a WHERE rand() < K clause instead. However, this approach requires BigQuery to scan the entire table. To save costs but still benefit from row-level sampling, you can combine both techniques.

Q: What is NaN in BigQuery?

As per https://www.json.org/json-en.html, valid JSON values can only be string, number, true or false or null. Hence NaN is interpreted by BigQuery as null since it is considered as an invalid value. A value can be a string in double quotes, or a number, or true or false or null, or an object or an array.

1 Answers

My answer will apply to BigQuery Standard SQL. RAND() function generates a pseudo-random value of type FLOAT64 in the range of [0, 1), inclusive of 0 and exclusive of 1. The way you would use it for sampling is similar to how you would use FARM_FINGERPRINT function, but you don't need to specify any existing key. RAND() provides uniform distribution, so if some columns have skew, same skew is expected in the sample. Example of sampling 10% of the data in the table:

SELECT * FROM Table WHERE RAND() < 0.1

184

answered Sep 21 '22 00:09

Mosha Pasumansky

Related questions
                            
                                'TRIM' or 'PROPER' in BigQuery
                            
                                BigQuery: How to Avoid "Resources exceeded during query execution." error
                            
                                "bad double value" in Google BigQuery
                            
                                Does Bigquery support triggers?
                            
                                Create a column of UUIDs in Google BigQuery
                            
                                Syntax error: Unexpected string literal '93868086.ga_sessions_' at [1:244] - BigQuery
                            
                                Bigquery ORDER BY (count )
                            
                                Big query is to slow
                            
                                How to get the first not null value from a column of values in Big Query?
                            
                                Return only the newest rows from a BigQuery table with a duplicate items
                            
                                How to filter on date in Big query
                            
                                How to Set Big Query Require Partition Filter in BQ Commandline
                            
                                How to get current TIMESTAMP in UTC from BigQuery?
                            
                                How to simulate a pivot table with BigQuery?
                            
                                Update nested field in BigQuery table
                            
                                Trim a decimal to 2 places Bigquery
                            
                                Converting YYYYMMDD string to date in standard SQL / BigQuery
                            
                                Is there any method to validate a query in the BigQuery api
                            
                                How do I shard a BigQuery table?
                            
                                In time-partitioned bigquery tables, when is data written to __UNPARTITIONED__? what are the effects?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does RAND() works in BigQuery?

Tags:

google-bigquery

Gallory Knox

People also ask

1 Answers

Mosha Pasumansky

Recent Activity

Donate For Us