I create daily tables. How to query the "newest" one?
For example GitHubArchive now publishes daily tables - instead of a monolithic one (see /r/bigquery/.../github_archive_changes_monthly_and_daily_tables/).
To change the start time, select the Start at set time option, enter the desired start date and time, and then save. Note: If the specified start time is later than the time in the schedule, then the first run of the query will be in the next iteration of the cycle.
The BigQuery data manipulation language (DML) enables you to update, insert, and delete data from your BigQuery tables. You can execute DML statements just as you would a SELECT statement, with the following conditions: You must use Google Standard SQL.
To find the latest table, you could use BigQuery's table query functions:
For example, the latest daily GitHubArchive table:
SELECT COUNT(*) num_records
FROM TABLE_QUERY(githubarchive:day,
"table_id IN (
SELECT table_id FROM githubarchive:day.__TABLES__
ORDER BY creation_time DESC LIMIT 1)")
# 201859
For maximum convenience, you can store that query as a view to share with others. Then to query the latest table, just query the view:
SELECT num_records
FROM [fh-bigquery:public_dump.githubarchive_latest_day]
# 201859
For more on table query functions, see Jordan's answer to How do I use the TABLE_QUERY() function in BigQuery?.
Felipe's answer uses Legacy SQL. Here's a solution in Standard SQL.
If your table names have a predictable suffix like GitHub's example in your link (which uses a YYYYMMDD
suffix for each day), you can use wildcard tables and TABLE_SUFFIX
:
SELECT COUNT(*)
FROM `githubarchive.day.events_*`
WHERE _TABLE_SUFFIX = (
SELECT MAX(SUBSTR(table_id, -8))
FROM `githubarchive.day.__TABLES_SUMMARY__`
WHERE REGEXP_CONTAINS(table_id, "events_\\d{8}")
)
If you want to rely on BigQuery's metadata about creation time instead of the prefix, you can do:
SELECT COUNT(*)
FROM `githubarchive.day.*`
WHERE _TABLE_SUFFIX = (
SELECT table_id
FROM `githubarchive.day.__TABLES_SUMMARY__`
ORDER BY creation_time DESC
LIMIT 1
)
Note that if you have tables with different schemas in your project, the latter syntax is dangerous, since the query will use the schema of the first table it sees in the wildcard.
If you're trying to create tables that are partitioned by date, there's a BigQuery feature called date-partitioned tables that does just that, and makes it easier to query afterwards.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With