Suppose you have a de-normalized schema with multiple rows like below:
uuid | property | value
------------------------------------------
abc | first_name | John
abc | last_name | Connor
abc | age | 26
...
The same set of properties for all rows, not necessarily sorted. How to create a table such as using BigQuery (i.e. no client):
Table user_properties:
uuid | first_name | last_name | age
--------------------------------------------------------
abc | John | Connor | 26
In traditional SQL there is the "STUFF" keyword for this purpose.
It would be easier if I could at least get the results ORDERED by uuid so the client would not need to load the whole table (4GB) to sort -- it would be possible to hydrate each entity by scanning sequentially the rows with same uuid. However, a query like this:
SELECT * FROM user_properties ORDER BY uuid;
exceeds the available resources in BigQuery (using allowLargeResults forbids ORDER BY). It almost seems like I cannot sort a large table (4GB) in BigQuery unless I subscribe to a high end machine. Any ideas?
SELECT
uuid,
MAX(IF(property = 'first_name', value, NULL)) AS first_name,
MAX(IF(property = 'last_name', value, NULL)) AS last_name,
MAX(IF(property = 'age', value, NULL)) AS age
FROM user_properties
GROUP BY uuid
Another option - no GROUP'ing involved
SELECT uuid, first_name, last_name, age
FROM (
SELECT
uuid,
LEAD(value, 1) OVER(PARTITION BY uuid ORDER BY property) AS first_name,
LEAD(value, 2) OVER(PARTITION BY uuid ORDER BY property) AS last_name,
value AS age,
property = 'age' AS anchor
FROM user_properties
)
HAVING anchor
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With