I am new to BigQuery (and sql) and am trying to setup a table where each row contains a message
column (string) and metadata
column (repeated). I want to use the metadata column so that I can query for messages that match certain metadata parameters.
For example, my message string is:
Hi honey, I'm home!
and the metadata column is arbitrary key/value pairs that describe the message:
{"category": "personal",
"message_type": "salutation"}
This would allow me to ask bigquery to:
return me all messages which have a category
of personal
(or message_type
of personal
, etc)
To reiterate, the metadata keys will potentially be different for every message row that is inserted (e.g. a new row may have a metadata key of favorite_color
instead of category
).
Here is the schema that I came up with in order to support the metadata flexibility that I desire:
[
{"name": "message", "type": "string", "mode": "required"}
{"name": "metadata", "type": "record", "mode": "repeated", "fields": [ {"name": "key",
"type": "string",
"mode": "required"},
{"name": "value",
"type": "string",
"mode": "required"}]}
]
The schema expects that every metadata entry/row contain a column called key
and column called value
. It seems to work fine.
Interestingly, Bigquery represents my message row as two rows (which I guess is called "flattening"?):
message | metadata_key | metadata_value
------------------------------------------------------------
Hi honey, I'm home! category personal
Hi honey, I'm home! message_type salutation
I figured out how to query against a single metadata value such as category
=personal
:
SELECT * FROM [table.test]
WHERE
metadata.key="category" AND metadata.value="personal"
I get a single row back:
message | metadata_key | metadata_value
------------------------------------------------------------
Hi honey, I'm home! category personal
This is great! However, I don't know how to construct a more complex query that matches against multiple metadata parameters, such as:
Fetch me all messages which have a category
of personal
and message_type
of salutation
This following query doesn't return anything:
SELECT * FROM [table.test]
WHERE
metadata.key="category" AND metadata.value="personal" AND
metadata.key="message_type" AND metadata.value="salutation"
I assume this is because bigquery is breaking down my single message row into two rows (one for each nested metadata row). This obviously means that my query will never be able to match a single row because I am trying to match against two different metadata rows (i.e. I'm asking for metadata.key to be equal to two different values at the same time, and metadata.value to be equal to different values at the same time). So I'm somewhat stumped as to how to construct my query appropriately.
Ideally I would like BigQuery to return a single row (the original example row that I started with), since it satisfies the two metadata requirements of my query.
I assume I need some combination of GROUP BY, FLATTEN, WITHIN, unflatten, etc. The combination of possibilities/structure are somewhat daunting for this sql/bigquery novice. Any help appreciated :)
How to Query BigQuery Repeated Fields. To extract information from a repeated field in BigQuery, you must use a more exotic pattern. This is normally done using the UNNEST function, which converts an array of values in a table into rows. These can then be joined to the original table to be queried.
A repeated field can be accessed as an ARRAY type in standard SQL. A RECORD column can have REPEATED mode, which is represented as an array of STRUCT types. Also, a field within a record can be repeated, which is represented as a STRUCT that contains an ARRAY . An array cannot contain another array directly.
This is just a loop over every day, week, etc on the same table, and then a COUNT on the distinct events happening within that period of time. But we can't do 'loops' in BigQuery.
To convert an ARRAY into a set of rows, also known as "flattening," use the UNNEST operator. UNNEST takes an ARRAY and returns a table with a single row for each element in the ARRAY . Because UNNEST destroys the order of the ARRAY elements, you may wish to restore order to the table.
Try below - just one of options
It sums how many "matches you have in each row and if all (in this cases - 2) it outputs it. Pretty simple
SELECT *,
SUM((metadata.key="category" AND metadata.value="personal")
OR (metadata.key="message_type" AND metadata.value="salutation")
) WITHIN RECORD AS check,
FROM [table.test]
HAVING check = 2
Cleaner version of this would be as below
SELECT *
FROM [table.test]
OMIT RECORD IF
SUM((metadata.key="category" AND metadata.value="personal")
OR (metadata.key="message_type" AND metadata.value="salutation")
) != 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With