Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery: Return First Value from Different Groups in a Group By

I am currently having a problem with a Standard SQL query. I have a list of emails where every email can have multiple functions. See the example below on how the table looks like.

Email                         Function
[email protected]               engineer
[email protected]               specialist
[email protected]                analyst
[email protected]                tester
[email protected]                manager
[email protected]             intern

What I want is a query that returns every email once with the first function it finds. So the above table should return the following:

Email                         Function
[email protected]               engineer
[email protected]                analyst
[email protected]             intern

How do I do this?

What I have right now is a simplified version of the query.

SELECT Email, Function
FROM database
GROUP BY Email, Function

The issue is here is that I have to put both Email and Function in the GROUP BY. If I only put Email in the Group By the query cannot run even though I only want the query to GROUP BY Email.

Thanks!

like image 710
Peter Nagel Avatar asked Mar 14 '19 12:03

Peter Nagel


People also ask

How do I select the first row in a group by a group?

To do that, you can use the ROW_NUMBER() function. In OVER() , you specify the groups into which the rows should be divided ( PARTITION BY ) and the order in which the numbers should be assigned to the rows ( ORDER BY ). You assign the row numbers within each group (i.e., year).

What is array AGG in BigQuery?

ARRAY_AGG. Returns an ARRAY of expression values. To learn more about the optional arguments in this function and how to use them, see Aggregate function calls. To learn more about the OVER clause and how to use it, see Window function calls.

What is String_agg in BigQuery?

BigQuery STRING_AGG function returns a value by gathering all expressions in a row and generating a concatenated string of non-null values. The value can be either STRING or BYTES. The result generally depends on the first expression taken by the BigQuery STRING_AGG function.


1 Answers

There is no such thing as the "first" function, because SQL tables represent unordered sets -- especially in a designed-from-the-ground-up parallel database such as BigQuery.

You need to use some sort of aggregation function.

A simple one is any_value():

SELECT Email, ANY_VALUE(Function)
FROM database
GROUP BY Email;

If you have another column that specifies the ordering, then it can be used to fetch the function associated with the minimum value of that column.

like image 56
Gordon Linoff Avatar answered Sep 23 '22 22:09

Gordon Linoff