I have a table of events of users, and I want to project those events into a new column with some predicate, and then aggregate the events together per user into a new projection that tells me if a user has ever had the predicate match for them, or if they've never had it match, etc.
In other languages this is usually called all()
and any()
, where you pass
it a list of boolean values and it will tell you if all of them match, or if at
least one matches. It's equivalent to using a boolean AND
on all boolean
values (such as in the case with all
) or using a boolean OR
on all boolean
values (as in any
).
Does BigQuery have this feature? I can sort of approximate it using max
and
min
but it's not ideal.
Example:
select
month(date_time) m,
count(*) as ct,
max(id_is_present),
min(id_is_present),
max(starts_with_one) max_one,
min(starts_with_one) min_one,
from
(
select
length(user_id) > 1 id_is_present,
regexp_match(user_id, r'^1') starts_with_one,
date_time
from
[user_events.2015_02]
)
group by
m
It's exploiting a behavior of max(true, false, false)
yielding true
, so you could sort of implement any
and all
by searching through the column for values and then building from there.
Is this the hack I have to rely on or does BigQuery support boolean aggregates?
In case someone else stumbles across this, standard SQL offers logical_and()
and logical_or
. So, the code could be written as:
select month(date_time) as m, count(*) as ct,
logical_or(id_is_present),
logical_and(id_is_present),
logical_or(starts_with_one) as max_one,
logical_and(starts_with_one) min_one,
from (select length(user_id) > 1 id_is_present,
regexp_match(user_id, r'^1') starts_with_one,
date_time
from [user_events.2015_02]
) u
group by m;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With