From the BigQuery page on partitioned tables:
Each table can have up to 2,000 partitions.
We planned to partition our data by day. Most of our queries will be date based but we have about 5 years historical data and plan to collect more each day from now. With only 2000 partitions: 2000/365 gives us about 5.5 years worth of data.
What is the best practice for tables wanting more than 2000 partitions?
Update: Table limit is now 4000 partitions.
BigQuery has a limit of 4,000 partitions per table.
If the table's partition expiration is 6 hours, then each partition expires at 06:00:00 UTC the following day. When a partition expires, BigQuery deletes the data in that partition. You can also specify a default partition expiration at the dataset level.
BigQuery Partitioned tables limitations As we've mentioned already, we do not have the ability to partition a table using multiple columns (or non-time/non-integer columns). Moreover, partitioned tables are only available through Standard SQL dialect and we cannot use legacy SQL to query or write results to them.
Ingestion time partitioning When you create a table partitioned by ingestion time, BigQuery automatically assigns rows to partitions based on the time when BigQuery ingests the data. You can choose hourly, daily, monthly, or yearly granularity for the partitions. Partitions boundaries are based on UTC time.
We are in the process of enforcing the 2000 partitions limitations (so we documented that a bit early to give our users early notice). So it is a soft limit at this point.
Creating large numbers of partitions has performance implications, so we suggest limiting tables to 2000 partitions. We have some room here depending on the schema of the table, so it would be reasonable to ask support if an increase is possible. We will review it depending on how much resources we think would be needed for operations against that table.
We hope to support a larger number of partitions (upto 10K) at some point in the future, but we are working through the design and implementation changes necessary for that (we don't have an ETA on it at this point).
Regarding your question "Is it possible to partition by week or month instead?", there is a feature request to get more flexibility on the type of partitions: https://issuetracker.google.com/issues/35905817
If we are able to also have INT as a type of partition, then it would be easy to define 'months partitions' in the way YYYYMM.
The limit is now 4,000 partitions which is just over 10 years of data. However if you have more than 10 years of data and would like it partitioned by day one workaround we have used is splitting your table into decades and then writing a view on top to union the decade tables together.
When querying the view with the date partitioned field in the where clause BigQuery knows to only process the required partitions even if this is across multiple or within a single table.
We have used this approach to ensure business users (data analysts and report developers) only need to worry about a single table but still access the performance and cost benefits of partitioned tables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With