How do you calculate a boolean aggregate over a column in BigQuery?

Question

I have a table of events of users, and I want to project those events into a new column with some predicate, and then aggregate the events together per user into a new projection that tells me if a user has ever had the predicate match for them, or if they've never had it match, etc.

In other languages this is usually called all() and any(), where you pass it a list of boolean values and it will tell you if all of them match, or if at least one matches. It's equivalent to using a boolean AND on all boolean values (such as in the case with all) or using a boolean OR on all boolean values (as in any).

Does BigQuery have this feature? I can sort of approximate it using max and min but it's not ideal.

Example:

select
month(date_time) m,
count(*) as ct,
max(id_is_present),
min(id_is_present),
max(starts_with_one) max_one,
min(starts_with_one) min_one,
from
(
    select
    length(user_id) > 1 id_is_present,
    regexp_match(user_id, r'^1') starts_with_one,
    date_time
    from
    [user_events.2015_02]
)
group by
m

It's exploiting a behavior of max(true, false, false) yielding true, so you could sort of implement any and all by searching through the column for values and then building from there.

Is this the hack I have to rely on or does BigQuery support boolean aggregates?

Gordon Linoff · Accepted Answer

In case someone else stumbles across this, standard SQL offers logical_and() and logical_or. So, the code could be written as:

select month(date_time) as m, count(*) as ct,
       logical_or(id_is_present),
       logical_and(id_is_present),
       logical_or(starts_with_one) as max_one,
       logical_and(starts_with_one) min_one,
from (select length(user_id) > 1 id_is_present,
             regexp_match(user_id, r'^1') starts_with_one,
             date_time
      from [user_events.2015_02]
      ) u
group by m;

How do you calculate a boolean aggregate over a column in BigQuery?

Tags:

sql

aggregate-functions

google-bigquery

Conrad.Dean

1 Answers

Gordon Linoff

Recent Activity

Donate For Us

How do you calculate a boolean aggregate over a column in BigQuery?

Tags:

sql

aggregate-functions

google-bigquery

Conrad.Dean

1 Answers

Gordon Linoff

Related questions

Recent Activity

Donate For Us