Partitioning by date?

Tags:

google-bigquery

We are experimenting with BigQuery to analyze user data generated by our software application.

Our working table consists hundreds of millions of rows, each representing a unique user "session". Each containing a timestamp, UUID, and other fields describing the user's interaction with our product during that session. We currently generate about 2GB of data (~10M rows) per day.

Every so often we may run queries against the entire dataset (about 2 months worth right now, and growing), However typical queries will span just a single day, week, or month. We're finding out that as our table grows, our single-day query becomes more and more expensive (as we would expect given BigQuery architecture)

What isthe best way to query subsets of of our data more efficiently? One approach I can think of is to "partition" the data into separate tables by day (or week, month, etc.) then query them together in a union:

SELECT foo from mytable_2012-09-01, mytable_2012-09-02, mytable_2012-09-03;

Is there a better way than this???

697

asked Sep 14 '12 23:09

David M Smith

1 Answers

BigQuery now supports table partitions by date:

https://cloud.google.com/blog/big-data/2016/03/google-bigquery-cuts-historical-data-storage-cost-in-half-and-accelerates-many-queries-by-10x

110

answered Oct 01 '22 00:10

Graham Polley

Related questions
                            
                                Query text specifies use_legacy_sql:false, while API options specify:true
                            
                                How to load data from Cloud Storage into BigQuery using Java
                            
                                BigQuery - filtering without losing 'null' values
                            
                                Parsing response from Google big query
                            
                                Standard SQL consistently slower than Legacy SQL?
                            
                                Port field from NULLABLE to REQUIRED in BigQuery
                            
                                When using unbounded PCollection from TextIO to BigQuery, data is stuck in Reshuffle/GroupByKey inside of BigQueryIO
                            
                                Initiating and reading from multiple streams with the BigQuery Storage API (Beta)
                            
                                Firebase Analytics - Set data location for BigQuery
                            
                                Recommendations using R with SimpleDB or BigQuery or using PHP with SimpleDB
                            
                                Import data in ProtocolBuffer format
                            
                                Can BigQuery be used as a primary query engine?
                            
                                Loading JSON file in BigQuery using Google BigQuery Client API
                            
                                Google BigQuery System Status Page?
                            
                                BigQuery executing only one query
                            
                                BigQuery: Does bq load command support loading from named pipe as a source?
                            
                                Resources Exceeded during query execution. BigQuery
                            
                                BigQuery Java API to read an Array of Record : "Retrieving field value by name is not supported" exception
                            
                                Exporting BigQuery Table Data to Google Cloud Storage having where clause using python
                            
                                Change default data collected by Firebase/Google analytics

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With