Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partitioning by date?

We are experimenting with BigQuery to analyze user data generated by our software application.

Our working table consists hundreds of millions of rows, each representing a unique user "session". Each containing a timestamp, UUID, and other fields describing the user's interaction with our product during that session. We currently generate about 2GB of data (~10M rows) per day.

Every so often we may run queries against the entire dataset (about 2 months worth right now, and growing), However typical queries will span just a single day, week, or month. We're finding out that as our table grows, our single-day query becomes more and more expensive (as we would expect given BigQuery architecture)

What isthe best way to query subsets of of our data more efficiently? One approach I can think of is to "partition" the data into separate tables by day (or week, month, etc.) then query them together in a union:

SELECT foo from mytable_2012-09-01, mytable_2012-09-02, mytable_2012-09-03;

Is there a better way than this???

like image 697
David M Smith Avatar asked Sep 14 '12 23:09

David M Smith


People also ask

Can you partition by date in SQL?

You can partition on DateCreated column after you make it a clustered index.

What is meant by data partitioning?

Data Partitioning is the technique of distributing data across multiple tables, disks, or sites in order to improve query processing performance or increase database manageability.


1 Answers

BigQuery now supports table partitions by date:

https://cloud.google.com/blog/big-data/2016/03/google-bigquery-cuts-historical-data-storage-cost-in-half-and-accelerates-many-queries-by-10x

like image 110
Graham Polley Avatar answered Oct 01 '22 00:10

Graham Polley