I have a set of day-sharded data where individual entries do not contain the day. I would like to use table wildcards to select all available data and get back data that is grouped by both the column I am interested in and the day that it was captured. Something, in other words, like this:
SELECT table_id, identifier, Sum(AppAnalytic) as AppAnalyticCount
FROM (TABLE_QUERY(database_main,'table_id CONTAINS "Title_" AND length(table_id) >= 4'))
GROUP BY identifier, table_id order by AppAnalyticCount DESC LIMIT 10
Of course, this does not actually work because table_id is not visible in the table aggregation resulting from the TABLE_QUERY function. Is there any way to accomplish this? Some sort of join on table metadata perhaps?
Even if you restrict the number of tables that you want to use from the wildcard table using the _TABLE_SUFFIX pseudo column in a WHERE clause, BigQuery uses the schema for the most recently created table that matches the wildcard.
To convert an ARRAY into a set of rows, also known as "flattening," use the UNNEST operator. UNNEST takes an ARRAY and returns a table with a single row for each element in the ARRAY . Because UNNEST destroys the order of the ARRAY elements, you may wish to restore order to the table.
This functionality is available now in BigQuery through _TABLE_SUFFIX
pseudocolumn. Full documentation is at https://cloud.google.com/bigquery/docs/querying-wildcard-tables.
Couple of things to note:
You will have to rename _TABLE_SUFFIX
into something else in your SELECT
list, i.e. following example illustrates it
SELECT _TABLE_SUFFIX as table_id, ... FROM `MyDataset.MyTablePrefix_*`
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With