percentile functions with GROUPBY in BigQuery

Tags:

google-bigquery

In my CENSUS table, I'd like to group by State, and for each State get the median county population and the number of counties.

In psql, redshift, and snowflake, I can do this:

psql=> SELECT state, count(county), PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "population2000") AS median FROM CENSUS GROUP BY state;
        state         | count |  median
----------------------+-------+----------
 Alabama              |    67 |    36583
 Alaska               |    24 |   7296.5
 Arizona              |    15 |   116320
 Arkansas             |    75 |    20229
...

I'm trying to find a nice way to do this in standard BigQuery. I've noticed that there's undocumented percentile_cont analytical function available, but I have to do some major hacks to get it to do what I want.

I'd like to be able to do the same sort thing with what I've gathered are the correct arguments:

SELECT
  state,
  COUNT(county),
  PERCENTILE_CONT(population2000,
    0.5) OVER () AS `medPop`
FROM
  CENSUS
GROUP BY
  state;

but this query yields the error

SELECT list expression references column population2000 which is neither grouped nor aggregated at

I can get the answer I want, but I'd be very disappointed if this is the recommended way to do what I want to do:

SELECT
  MAX(nCounties) AS nCounties,
  state,
  MAX(medPop) AS medPop
FROM (
  SELECT
    nCounties,
    T1.state,
    (PERCENTILE_CONT(population2000,
        0.5) OVER (PARTITION BY T1.state)) AS `medPop`
  FROM
    census T1
  LEFT OUTER JOIN (
    SELECT
      COUNT(county) AS `nCounties`,
      state
    FROM
      census
    GROUP BY
      state) T2
  ON
    T1.state = T2.state) T3
GROUP BY
  state

Is there a better way to do what I want to do? Also, is the PERCENTILE_CONT function ever going to be documented?

Thanks for reading!

289

asked Aug 09 '17 00:08

donhcd

1 Answers

Thanks for your interest. PERCENTILE_CONT is under development, and we will publish the documentation once it is GA. We will support it as analytic function first, and we plan to support it as aggregate function (allowing GROUP BY) later. Between these 2 releases, a simpler workaround would be

SELECT
  state,
  ANY_VALUE(nCounties) AS nCounties,
  ANY_VALUE(medPop) AS medPop
FROM (
  SELECT
    state,
    COUNT(county) OVER (PARTITION BY state) AS nCounties,
    PERCENTILE_CONT(population2000,
      0.5) OVER (PARTITION BY state) AS medPop
  FROM
    CENSUS)
GROUP BY
  state

115

answered Oct 16 '22 07:10

Mingyu Zhong

Related questions
                            
                                multiprocessing returns "too many open files" but using `with...as` fixes it. Why?
                            
                                OpenGL object in C++ RAII class no longer works
                            
                                How to NOT download artifacts from previous stages for build configuration?
                            
                                Cloud Firestore Case Insensitive Sorting Using Query
                            
                                How to set multiple values with helm?
                            
                                embed openstreetmap iframe in github markdown
                            
                                Getting TypeError: reduction operation 'argmax' not allowed for this dtype when trying to use idxmax()
                            
                                How to drop a collection with pymongo?
                            
                                Why is arrow syntax preferred over function declaration for functional React components?
                            
                                keytool error: java.lang.Exception: Only one command is allowed: both -exportcert and -list were specified
                            
                                Angular component default style css display block
                            
                                What's the difference between Azure AD B2C tenant and normal Azure AD tenant?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With