We want to start using "Partitioned tables" in BQ But documentation(https://cloud.google.com/bigquery/docs/partitioned-tables) says that using "Streaming inserts" possible only
if the partitioning value is up to 7 days in the past, up to 3 days in the future,
In our case, we have some data which could have q partition value more than 7 days in the past.
We save data via BigQuery REST api
Does it mean that we can't use the partition tables or is there some other workaround this? How to save the data which is out of bounds(7days3days) for a partition table?
In general, the idea is: we have a table with streaming data(~100 records per min) and we want to stream data directly to partitions and then use the partitions for analytical queries
A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.
If you want to query data based on a time zone other than UTC, choose one of the following options: Adjust for time zone differences in your SQL queries. Use partition decorators to load data into specific ingestion-time partitions, based on a different time zone than UTC.
You can always over-write a partitioned table in BQ using the postfix of YYYYMMDD in the output table name of your query, along with using WRITE_TRUNCATE as your write disposition (i.e. to truncate whatever is existing in that partition and write new results).
To summarize the comment thread:
It's not possible to stream to partitions beyond [7 days in the past, 3 days in the future]. This is a performance related limitation, and the team is working to remove it.
Workaround: Stream your data to a non-partitioned table, and from there insert into the partitioned one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With