I need to calculate median value of a numeric sequence in Google BigQuery efficiently. Is the same possible?

2018 update with more metrics: BigQuery SQL: Average, geometric mean, remove outliers, median <hr> For my own memory purposes, working queries with taxi data: Approximate quantiles: <pre class="prettyprint"><code>SELECT MONTH(pickup_datetime) month, NTH(51, QUANTILES(tip_amount,101)) median FROM [nyc-tlc:green.trips_2015] WHERE tip_amount > 0 GROUP BY 1 ORDER BY 1 </code></pre> Gives the same results as PERCENTILE_DISC: <pre class="prettyprint"><code>SELECT month, FIRST(median) median FROM ( SELECT MONTH(pickup_datetime) month, tip_amount, PERCENTILE_DISC(0.5) OVER(PARTITION BY month ORDER BY tip_amount) median FROM [nyc-tlc:green.trips_2015] WHERE tip_amount > 0 ) GROUP BY 1 ORDER BY 1 </code></pre> StandardSQL: <pre class="prettyprint"><code>#StandardSQL SELECT DATE_TRUNC(DATE(pickup_datetime), MONTH) month, APPROX_QUANTILES(tip_amount,1000)[OFFSET(500)] median FROM `nyc-tlc.green.trips_2015` WHERE tip_amount > 0 GROUP BY 1 ORDER BY 1 </code></pre>

How to calculate median of a numeric sequence in Google BigQuery efficiently?

1 Answers

2018 update with more metrics:

BigQuery SQL: Average, geometric mean, remove outliers, median

For my own memory purposes, working queries with taxi data:

Approximate quantiles:

SELECT MONTH(pickup_datetime) month, NTH(51, QUANTILES(tip_amount,101)) median
FROM [nyc-tlc:green.trips_2015]
WHERE tip_amount > 0
GROUP BY 1
ORDER BY 1

Gives the same results as PERCENTILE_DISC:

SELECT month, FIRST(median) median
FROM (
  SELECT MONTH(pickup_datetime) month, tip_amount, PERCENTILE_DISC(0.5) OVER(PARTITION BY month ORDER BY tip_amount) median
  FROM [nyc-tlc:green.trips_2015]
  WHERE tip_amount > 0
)
GROUP BY 1
ORDER BY 1

StandardSQL:

#StandardSQL
SELECT DATE_TRUNC(DATE(pickup_datetime), MONTH) month, APPROX_QUANTILES(tip_amount,1000)[OFFSET(500)] median
FROM `nyc-tlc.green.trips_2015`
WHERE tip_amount > 0
GROUP BY 1
ORDER BY 1

196

answered Sep 24 '22 22:09

Felipe Hoffa

Related questions
                            
                                How to query BigQuery programmatically from Python without end-user interaction?
                            
                                BigQuery Where Date is Less Than or Equal to 3 Days Minus Current Date
                            
                                Is there a way to do rolling averages in Big Query?
                            
                                BigQuery equivalent of COALESCE()?
                            
                                Is there a easy way to get the data with timestamp == yesterday?
                            
                                Is user_pseudo_id the same as a a session id? How to group all events by session? - Firebase BigQuery
                            
                                Querying multiple repeated fields in BigQuery
                            
                                How to improve performance of GeoIP query in BigQuery?
                            
                                How to load multiple files (same schema) into a table in BigQuery?
                            
                                Computing Percentiles In BigQuery
                            
                                Getting Error: redirect_uri_mismatch The redirect URI in the request: http://localhost:8080/oauth2callback did not match a registered redirect URI
                            
                                Google BigQuery Underlying Architecture
                            
                                Bigquery - Insert new data row into table by python
                            
                                BigQuery - DELETE statement to remove duplicates
                            
                                BigQuery - Select only first row in BigQuery
                            
                                How to create a readstream with a buffer using Node.js
                            
                                How to integrate Google Cloud SQL with Google Big Query
                            
                                Total Sessions in BigQuery vs Google Analytics Reports
                            
                                Google BigQuery case insensitive match
                            
                                Does Google BigQuery/ Amazon Redshift use column-based relational database or NoSQL database?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate median of a numeric sequence in Google BigQuery efficiently?

Tags:

median

google-bigquery

Manish Agrawal

People also ask

1 Answers

Felipe Hoffa

Recent Activity

Donate For Us