I have the following requirement.
We have 1400 sharded tables in BQ based on specific functional grouping. Maximum of these individual sharded tables are also time-partitioned for DAY.
Smaller tables are not time-partitioned.
I am trying to create views dynamically in BigQuery by doing a UNION of all the sharded tables and writing a WHERE filter condition on _partitiontime.
But the problem is there could be smaller tables which are not time-partitioned, the query fails.
There are many more tables created on daily basis and I cant settle for a static exclusion solution since that would require manually maintaining a file with table names etc. (Apache Beam + BQ uses the data that is coming in and creates new sharded tables for newer functional groups without manual intervention.)
I am trying to exclude the non time-partitioned tables by using bq
utility to connect to BQ more than 1000 times to check if a table is time-partitioned.
bq show --format=prettyjson
and check the timepartitioning
type
field.
This is very slow, more than 30 minutes.
I have tried the __TABLES_SUMMARY__
, but it does not have the partitioning info.
I have also checked
SELECT partition_id from [mydataset.table1$__PARTITIONS_SUMMARY__];
, but this works if the table is already partitioned.
You can check if table is partitioned in SQL Server with following query. select distinct pp. [object_id], TbName = OBJECT_NAME(pp. [object_id]), index_name = i.
To retrieve table metadata by using INFORMATION_SCHEMA tables, you will need to have any of the following Identity and Access Management (IAM) roles that give you the necessary permissions: roles/bigquery. admin.
If you want to query data based on a time zone other than UTC, choose one of the following options: Adjust for time zone differences in your SQL queries. Use partition decorators to load data into specific ingestion-time partitions, based on a different time zone than UTC.
A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.
You can write the below query in standard SQL in BQ to get the information
SELECT *
FROM `project_id.dataset_name.INFORMATION_SCHEMA.COLUMNS`
WHERE TABLE_NAME = Table_Name
AND is_partitioning_column = "YES"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With