Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery Partitioned tables insert data from the past

We want to start using "Partitioned tables" in BQ But documentation(https://cloud.google.com/bigquery/docs/partitioned-tables) says that using "Streaming inserts" possible only

if the partitioning value is up to 7 days in the past, up to 3 days in the future,

In our case, we have some data which could have q partition value more than 7 days in the past.

We save data via BigQuery REST api

Does it mean that we can't use the partition tables or is there some other workaround this? How to save the data which is out of bounds(7days3days) for a partition table?

In general, the idea is: we have a table with streaming data(~100 records per min) and we want to stream data directly to partitions and then use the partitions for analytical queries

like image 597
Nick Bilozerov Avatar asked Mar 12 '18 14:03

Nick Bilozerov


People also ask

How does partitioning work in BigQuery?

A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.

How would you query specific partitions in a BigQuery table?

If you want to query data based on a time zone other than UTC, choose one of the following options: Adjust for time zone differences in your SQL queries. Use partition decorators to load data into specific ingestion-time partitions, based on a different time zone than UTC.

How do you overwrite a partition in BigQuery?

You can always over-write a partitioned table in BQ using the postfix of YYYYMMDD in the output table name of your query, along with using WRITE_TRUNCATE as your write disposition (i.e. to truncate whatever is existing in that partition and write new results).


1 Answers

To summarize the comment thread:

  • It's not possible to stream to partitions beyond [7 days in the past, 3 days in the future]. This is a performance related limitation, and the team is working to remove it.

  • Workaround: Stream your data to a non-partitioned table, and from there insert into the partitioned one.

like image 147
Felipe Hoffa Avatar answered Sep 18 '22 09:09

Felipe Hoffa