Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bigquery: Partitioning data past 2000 limit (Update: Now 4000 limit) [duplicate]

From the BigQuery page on partitioned tables:

Each table can have up to 2,000 partitions.

We planned to partition our data by day. Most of our queries will be date based but we have about 5 years historical data and plan to collect more each day from now. With only 2000 partitions: 2000/365 gives us about 5.5 years worth of data.

What is the best practice for tables wanting more than 2000 partitions?

  • Create a different table per year and join tables when required?
  • Is it possible to partition by week or month instead?
  • Can that 2000 partition limit be increased if you ask support?

Update: Table limit is now 4000 partitions.

like image 647
andy boot Avatar asked May 22 '17 17:05

andy boot


People also ask

How many partitions can you have in a BigQuery table?

BigQuery has a limit of 4,000 partitions per table.

What is partition expiration BigQuery?

If the table's partition expiration is 6 hours, then each partition expires at 06:00:00 UTC the following day. When a partition expires, BigQuery deletes the data in that partition. You can also specify a default partition expiration at the dataset level.

Can you partition an existing table BigQuery?

BigQuery Partitioned tables limitations As we've mentioned already, we do not have the ability to partition a table using multiple columns (or non-time/non-integer columns). Moreover, partitioned tables are only available through Standard SQL dialect and we cannot use legacy SQL to query or write results to them.

What is time partitioning BigQuery?

Ingestion time partitioning When you create a table partitioned by ingestion time, BigQuery automatically assigns rows to partitions based on the time when BigQuery ingests the data. You can choose hourly, daily, monthly, or yearly granularity for the partitions. Partitions boundaries are based on UTC time.


3 Answers

We are in the process of enforcing the 2000 partitions limitations (so we documented that a bit early to give our users early notice). So it is a soft limit at this point.

Creating large numbers of partitions has performance implications, so we suggest limiting tables to 2000 partitions. We have some room here depending on the schema of the table, so it would be reasonable to ask support if an increase is possible. We will review it depending on how much resources we think would be needed for operations against that table.

We hope to support a larger number of partitions (upto 10K) at some point in the future, but we are working through the design and implementation changes necessary for that (we don't have an ETA on it at this point).

like image 144
Pavan Edara Avatar answered Oct 22 '22 16:10

Pavan Edara


Regarding your question "Is it possible to partition by week or month instead?", there is a feature request to get more flexibility on the type of partitions: https://issuetracker.google.com/issues/35905817

If we are able to also have INT as a type of partition, then it would be easy to define 'months partitions' in the way YYYYMM.

like image 2
Sourygna Avatar answered Oct 22 '22 15:10

Sourygna


The limit is now 4,000 partitions which is just over 10 years of data. However if you have more than 10 years of data and would like it partitioned by day one workaround we have used is splitting your table into decades and then writing a view on top to union the decade tables together.

When querying the view with the date partitioned field in the where clause BigQuery knows to only process the required partitions even if this is across multiple or within a single table.

We have used this approach to ensure business users (data analysts and report developers) only need to worry about a single table but still access the performance and cost benefits of partitioned tables.

like image 1
Matt Laz Avatar answered Oct 22 '22 17:10

Matt Laz