A couple of questions about the TABLE_QUERY function: <ul> <li>The examples show using <code>table_id</code> in the query string, are there other fields available?</li> <li>It seems difficult to debug. I'm getting "error evaluating subsidiary query" when I try to use it.</li> <li>How does <code>TABLE_QUERY()</code> work? </li> </ul>

The <code>TABLE_QUERY()</code> function allows you to write a SQL <code>WHERE</code> clause that is evaluated to find which tables to run the query over. For instance, you can run the following query to count the rows in all tables in the <code>publicdata:samples</code> dataset that are older than 7 days: <pre class="prettyprint"><code>SELECT count(*) FROM TABLE_QUERY(publicdata:samples, "MSEC_TO_TIMESTAMP(creation_time) < " + "DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')") </code></pre> Or you can run this to query over all tables that have ‘git’ in the name (which are the <code>github_timeline</code> and the <code>github_nested</code> sample tables) and find the most common urls: <pre class="prettyprint"><code>SELECT url, COUNT(*) FROM TABLE_QUERY(publicdata:samples, "table_id CONTAINS 'git'") GROUP EACH BY url ORDER BY url DESC LIMIT 100 </code></pre> Despite being very powerful, <code>TABLE_QUERY()</code> can be difficult to use. The <code>WHERE</code> clause must be specified as a string, which can be a little bit awkward. Moreover, it can be difficult to debug, since when there is a problem, you only get the error “Error evaluating subsidiary query”, which isn’t always helpful. How it works: <code>TABLE_QUERY()</code> essentially executes two queries. When you run <code>TABLE_QUERY(<dataset>, <table_query>)</code>, BigQuery executes <code>SELECT table_id FROM <dataset>.__TABLES_SUMMARY__ WHERE <table_query></code> to get the list of table IDs to run the query on, then it executes your actual query over those tables. The <code>__TABLES__</code> portion of that query may look unfamiliar. <code>__TABLES_SUMMARY__</code> is a meta-table containing information about tables in a dataset. You can use this meta-table yourself. For example, the query <code>SELECT * FROM publicdata:samples.__TABLES_SUMMARY__</code> will return metadata about the tables in the <code>publicdata:samples</code> dataset. Available Fields: The fields of the <code>__TABLES_SUMMARY__</code> meta-table (that are all available in the <code>TABLE_QUERY</code> query) include: <ul> <li> <code>table_id</code>: name of the table.</li> <li> <code>creation_time</code>: time, in milliseconds since 1/1/1970 UTC, that the table was created. This is the same as the <code>creation_time</code> field on the table.</li> <li> <code>type</code>: whether it is a view (2) or regular table (1).</li> </ul> The following fields are not available in <code>TABLE_QUERY()</code> since they are members of <code>__TABLES__</code> but not <code>__TABLES_SUMMARY__</code>. They're kept here for historical interest and to partially document the <code>__TABLES__</code> metatable: <ul> <li> <code>last_modified_time</code>: time, in milliseconds since 1/1/1970 UTC, that the table was updated (either metadata or table contents). Note that if you use the <code>tabledata.insertAll()</code> to stream records to your table, this might be a few minutes out of date.</li> <li> <code>row_count</code>: number of rows in the table.</li> <li> <code>size_bytes</code>: total size in bytes of the table.</li> </ul> How to debug In order to debug your <code>TABLE_QUERY()</code> queries, you can do the same thing that BigQuery does; that is, you can run the the metatable query yourself. For example: <pre class="prettyprint"><code>SELECT * FROM publicdata:samples.__TABLES_SUMMARY__ WHERE MSEC_TO_TIMESTAMP(creation_time) < DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY') </code></pre> lets you not only debug your query but also see what tables would be returned when you run the <code>TABLE_QUERY</code> function. Once you have debugged the inner query, you can put it together with your full query over those tables.

Alternative answer, for those moving forward to Standard SQL: <ul> <li>BigQuery Standard SQL doesn't support TABLE_QUERY, but it supports * expansion for table names.</li> <li>When expanding table names *, you can use the meta-column _TABLE_SUFFIX to narrow the selection.</li> <li>Table expansion with * only works when all tables have compatible schemas.</li> </ul> For example, to get the average worldwide NOAA GSOD temperature between 2010 and 2014: <pre class="prettyprint"><code>#standardSQL SELECT AVG(temp) avg_temp, _TABLE_SUFFIX y FROM `bigquery-public-data.noaa.gsod_20*` #every year that starts with "20" WHERE _TABLE_SUFFIX BETWEEN "10" AND "14" #only years between 2010 and 2014 GROUP BY y ORDER BY y </code></pre>

How do I use the TABLE_QUERY() function in BigQuery?

2 Answers

The TABLE_QUERY() function allows you to write a SQL WHERE clause that is evaluated to find which tables to run the query over. For instance, you can run the following query to count the rows in all tables in the publicdata:samples dataset that are older than 7 days:

Click to copy

SELECT count(*) FROM TABLE_QUERY(publicdata:samples,     "MSEC_TO_TIMESTAMP(creation_time) < "     + "DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')")

Or you can run this to query over all tables that have ‘git’ in the name (which are the github_timeline and the github_nested sample tables) and find the most common urls:

Click to copy

SELECT url, COUNT(*) FROM TABLE_QUERY(publicdata:samples, "table_id CONTAINS 'git'") GROUP EACH BY url ORDER BY url DESC LIMIT 100

Despite being very powerful, TABLE_QUERY() can be difficult to use. The WHERE clause must be specified as a string, which can be a little bit awkward. Moreover, it can be difficult to debug, since when there is a problem, you only get the error “Error evaluating subsidiary query”, which isn’t always helpful.

How it works:

TABLE_QUERY() essentially executes two queries. When you run TABLE_QUERY(<dataset>, <table_query>), BigQuery executes SELECT table_id FROM <dataset>.__TABLES_SUMMARY__ WHERE <table_query> to get the list of table IDs to run the query on, then it executes your actual query over those tables.

The __TABLES__ portion of that query may look unfamiliar. __TABLES_SUMMARY__ is a meta-table containing information about tables in a dataset. You can use this meta-table yourself. For example, the query SELECT * FROM publicdata:samples.__TABLES_SUMMARY__ will return metadata about the tables in the publicdata:samples dataset.

Available Fields:

The fields of the __TABLES_SUMMARY__ meta-table (that are all available in the TABLE_QUERY query) include:

table_id: name of the table.
creation_time: time, in milliseconds since 1/1/1970 UTC, that the table was created. This is the same as the creation_time field on the table.
type: whether it is a view (2) or regular table (1).

The following fields are not available in TABLE_QUERY() since they are members of __TABLES__ but not __TABLES_SUMMARY__. They're kept here for historical interest and to partially document the __TABLES__ metatable:

last_modified_time: time, in milliseconds since 1/1/1970 UTC, that the table was updated (either metadata or table contents). Note that if you use the tabledata.insertAll() to stream records to your table, this might be a few minutes out of date.
row_count: number of rows in the table.
size_bytes: total size in bytes of the table.

How to debug

In order to debug your TABLE_QUERY() queries, you can do the same thing that BigQuery does; that is, you can run the the metatable query yourself. For example:

Click to copy

SELECT * FROM publicdata:samples.__TABLES_SUMMARY__  WHERE MSEC_TO_TIMESTAMP(creation_time)  <     DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')

lets you not only debug your query but also see what tables would be returned when you run the TABLE_QUERY function. Once you have debugged the inner query, you can put it together with your full query over those tables.

129

answered Oct 02 '22 07:10

Jordan Tigani

Alternative answer, for those moving forward to Standard SQL:

BigQuery Standard SQL doesn't support TABLE_QUERY, but it supports * expansion for table names.
When expanding table names *, you can use the meta-column _TABLE_SUFFIX to narrow the selection.
Table expansion with * only works when all tables have compatible schemas.

For example, to get the average worldwide NOAA GSOD temperature between 2010 and 2014:

Click to copy

#standardSQL
SELECT AVG(temp) avg_temp, _TABLE_SUFFIX y
FROM `bigquery-public-data.noaa.gsod_20*` #every year that starts with "20"
WHERE _TABLE_SUFFIX BETWEEN "10" AND "14" #only years between 2010 and 2014
GROUP BY y
ORDER BY y

answered Oct 02 '22 06:10

Felipe Hoffa

Related questions
                            
                                Reference Firebase user objects in Firestore
                            
                                Running PyTorch multiprocessing in a Docker container with Gunicorn worker manager
                            
                                Passing a path parameter to Google's Endpoint for Cloud Function
                            
                                How to set AWS S3 credentials in a Google Firebase cloud function?
                            
                                Flutter Firestore reset cache
                            
                                RuntimeValueProviderError when creating a google cloud dataflow template with Apache Beam python
                            
                                Spring Boot Logging and Google Cloud Platform Log Viewer
                            
                                Is there anyway I can use preemptible instance for dataflow jobs?
                            
                                How to solve ConcurrentModificationException
                            
                                The await expression can only be used in an async function
                            
                                How to log from a custom ai platform model
                            
                                Flutter Web/Dart CORS Error with Firebase Hosting
                            
                                GCP Cloud build ignores timeout settings
                            
                                Does Firebase charge for document writes that don't actually change any field values?
                            
                                View and not download Google Cloud Storage files in browser
                            
                                Google Data Studio: How to count number of specific events
                            
                                Updating a single value in Firebase with python
                            
                                Firebase ML kit give FirebaseMLException: Waiting for the text recognition model to be downloaded. Please wait
                            
                                Firebase's iOS Offline Capabilities vs Core Data
                            
                                Cannot resolve symbol default_web_client_id in Firebase's Android Codelab

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I use the TABLE_QUERY() function in BigQuery?

Tags:

google-bigquery

Jordan Tigani

People also ask

2 Answers

Jordan Tigani

Felipe Hoffa

Recent Activity

Donate For Us