I am using BigQuery, and I need to compute the 25th, 50th, and 75th percentile of a column of a dataset.
For example, how can I get the aforementioned numbers using BigQuery and STANDARD SQL. I have looked at the PERCENT_RANK, RANK, and NTILE functions but I can't seem to crack it.
Here's some code that may guide me
Appreciate the help!
To get percentiles, simply ask for 100 quantiles. select percentiles[offset(10)] as p10, percentiles[offset(25)] as p25, percentiles[offset(50)] as p50, percentiles[offset(75)] as p75, percentiles[offset(90)] as p90, from ( select approx_quantiles(char_length(text), 100) percentiles from `bigquery-public-data.
PERCENT_RANK() The PERCENT_RANK function in SQL Server calculates the relative rank SQL Percentile of each row. It always returns values greater than 0, and the highest value is 1. It does not count any NULL values.
In case approximate aggregation does not work for you, you might want to use the PERCENTILE_CONT function (though it will use much more memory so it might not work for huge data), e.g. the following example is from here
SELECT
PERCENTILE_CONT(x, 0) OVER() AS min,
PERCENTILE_CONT(x, 0.01) OVER() AS percentile1,
PERCENTILE_CONT(x, 0.5) OVER() AS median,
PERCENTILE_CONT(x, 0.9) OVER() AS percentile90,
PERCENTILE_CONT(x, 1) OVER() AS max
FROM UNNEST([0, 3, NULL, 1, 2]) AS x LIMIT 1;
+-----+-------------+--------+--------------+-----+
| min | percentile1 | median | percentile90 | max |
+-----+-------------+--------+--------------+-----+
| 0 | 0.03 | 1.5 | 2.7 | 3 |
+-----+-------------+--------+--------------+-----+
Check out APPROX_QUANTILES function in Standard SQL. If you ask for 100 quantiles - you get percentiles. So the query will look like following:
SELECT percentiles[offset(25)], percentiles[offset(50)], percentiles[offset(75)]
FROM (SELECT APPROX_QUANTILES(column, 100) percentiles FROM Table)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With