Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing Percentiles In BigQuery

I am using BigQuery, and I need to compute the 25th, 50th, and 75th percentile of a column of a dataset.

For example, how can I get the aforementioned numbers using BigQuery and STANDARD SQL. I have looked at the PERCENT_RANK, RANK, and NTILE functions but I can't seem to crack it.

Here's some code that may guide me

Appreciate the help!

like image 281
Praangrammer Avatar asked May 13 '17 00:05

Praangrammer


People also ask

How do you calculate percentiles in BigQuery?

To get percentiles, simply ask for 100 quantiles. select percentiles[offset(10)] as p10, percentiles[offset(25)] as p25, percentiles[offset(50)] as p50, percentiles[offset(75)] as p75, percentiles[offset(90)] as p90, from ( select approx_quantiles(char_length(text), 100) percentiles from `bigquery-public-data.

How do you find the percentile in SQL query?

PERCENT_RANK() The PERCENT_RANK function in SQL Server calculates the relative rank SQL Percentile of each row. It always returns values greater than 0, and the highest value is 1. It does not count any NULL values.


2 Answers

In case approximate aggregation does not work for you, you might want to use the PERCENTILE_CONT function (though it will use much more memory so it might not work for huge data), e.g. the following example is from here

SELECT
  PERCENTILE_CONT(x, 0) OVER() AS min,
  PERCENTILE_CONT(x, 0.01) OVER() AS percentile1,
  PERCENTILE_CONT(x, 0.5) OVER() AS median,
  PERCENTILE_CONT(x, 0.9) OVER() AS percentile90,
  PERCENTILE_CONT(x, 1) OVER() AS max
FROM UNNEST([0, 3, NULL, 1, 2]) AS x LIMIT 1;

+-----+-------------+--------+--------------+-----+
| min | percentile1 | median | percentile90 | max |
+-----+-------------+--------+--------------+-----+
| 0   | 0.03        | 1.5    | 2.7          | 3   |
+-----+-------------+--------+--------------+-----+
like image 180
Hoda Avatar answered Sep 21 '22 11:09

Hoda


Check out APPROX_QUANTILES function in Standard SQL. If you ask for 100 quantiles - you get percentiles. So the query will look like following:

SELECT percentiles[offset(25)], percentiles[offset(50)], percentiles[offset(75)]
FROM (SELECT APPROX_QUANTILES(column, 100) percentiles FROM Table)
like image 21
Mosha Pasumansky Avatar answered Sep 21 '22 11:09

Mosha Pasumansky