I have a Google Big Query Table that has an email
column in it. Basically each rows shows a state the user with that email address existed in. What I want to do is query the table to get a result showing the most recent row per email address. I've tried all sorts of GROUP BY
's, JOIN
ing the table against itself and the usual fun stuff that I would use in MySQL, but I keep getting duplicate emails returned if the entire row isn't a match.
Any help is much appreciated!
Sample Data
user_email | user_first_name | user_last_name | time | is_deleted
[email protected] | Joe | John | 123456790 | 1
[email protected] | Joe | John | 123456789 | 0
[email protected] | Jill | John | 123456789 | 0
So if sampling that data I would want to return:
user_email | user_first_name | user_last_name | time | is_deleted
[email protected] | Joe | John | 123456790 | 1
[email protected] | Jill | John | 123456789 | 0
SELECT user_email, user_first_name, user_last_name, time, is_deleted
FROM (
SELECT user_email, user_first_name, user_last_name, time, is_deleted
, RANK() OVER(PARTITION BY user_email ORDER BY time DESC) rank
FROM table
)
WHERE rank=1
Solved!
SELECT l.* FROM [mytable.list] l JOIN (
SELECT user_email, MAX(time) as time FROM [mytable.list] GROUP EACH BY user_email
) j ON j.user_email = l.user_email WHERE j.time = l.time;
There is a potential shortcoming of the use of RANK()
over the alternative numbering function ROW_NUMBER()
. The accepted answer does provide the desired solution, except in the event in a tie in the order by clause, were duplicate records are again returned:
with minimal_reproducible as (
select '[email protected]' as user_email, 'Joe' as user_first_name, 'John' as user_last_name, 123456789 as time, 1 is_deleted
union all
select '[email protected]', 'Joe', 'John', 123456789, 0
union all
select '[email protected]', 'Jill', 'John', 123456789, 0
)
select user_email, user_first_name, user_last_name, time, is_deleted from (
select *,
rank() over (partition by user_email order by time desc) as rank
from minimal_reproducible) inner_table
where rank = 1;
A better solution therefore is to use ROW_NUMBER()
in the place of RANK()
to ensure (albeit arbitrarily) unique user_email
come what may:
with minimal_reproducible as (
select '[email protected]' as user_email, 'Joe' as user_first_name, 'John' as user_last_name, 123456789 as time, 1 is_deleted
union all
select '[email protected]', 'Joe', 'John', 123456789, 0
union all
select '[email protected]', 'Jill', 'John', 123456789, 0
)
select user_email, user_first_name, user_last_name, time, is_deleted from (
select *,
row_number() over (partition by user_email order by time desc) as row_number
from minimal_reproducible) inner_table
where row_number = 1;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With