Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wilcard on day table vs time partition

I try to understand if there is a difference in big query (in the cost or possibility of requesting for example) between :

  • Create one table per day (like my_table_2018_02_06)
  • Create a time partitioned table (my-table with time partition by day).

Thanks !

like image 498
jeremieca Avatar asked Feb 06 '18 10:02

jeremieca


1 Answers

Short explanation: querying multiple tables using Wildcard Tables was the proposed alternative for when BigQuery did not have a partition mechanism available. The natural evolution was to include the feature of Partitioned Table, and currently there is an alpha release consisting in column-based time partitioning, i.e. letting the user define which column (having a DATE or TIMESTAMP data type) will be used for the partitioning.

So currently BigQuery engineers are working in adding more new features to table partitioning, instead of the legacy Wildcard Tables methodology, then I'd suggest that you work with them.


Long explanation: you are comparing two approaches that in fact are used with the same purpose, but which have different implications:

  • Wildcard Tables: some time ago, when table partitioning was not a feature supported by Big Query, Wildcard Tables was the way to query multiple tables using concise SQL queries. A Wildcard Table represents the union of all the tables that match the wildcard expression specified in the SQL statement. However, Wildcard Tables have some limitations, such as:
    • Do not support views.
    • Do not support cached results (queries containing wildcard tables are billed every time they are run, even if the "cached results" option is checked).
    • Only work with native BigQuery storage (cannot work with external tables [Bigtable, Storage or Drive]).
    • Only available in standard SQL.
  • Partitioned Tables: these are unique tables that are divided into segments, split by date. There is a lot of documentation regarding how to work with Partitioned Tables, and regarding the pricing, each partition in a Partitioned Table is considered an independent entity, so if a partition was not updated for the last 90 days, this data will be considered long-term and therefore will be billed with the appropriate discount (as would happen with a normal table). Finally, Partitioned Tables are here to stay, so there are more incoming features to them, such as column-based partitioning, which is currently in alpha, and you can follow its status in this Public Issue Tracker post. On the other hand, there are also some current limitations to be considered:
    • Maximum of 2500 partitions per Partitioned Table.
    • Maximum of 2000 partition updates per table per day.
    • Maximum of 50 partition updates every 10 seconds.

So in general, it would be advisable to work with Partitioned Tables over multiple tables using Wildcard Tables. However, you should always consider your use case and see which one of the possibilities meets your requirements better.

like image 141
dsesto Avatar answered Nov 07 '22 02:11

dsesto