Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery max query length characters work around

First let me explain the problem. I have 500 unique users. The data from each of these users is split into smaller gzip files(lets say on an average 25 files per user). We have loaded each split gzip file as a separate table in BiqQuery. Therefore, our dataset has 13000 something tables in it.

Now, We have to run time range queries to retrieve some data from each user. We have around 500-1000 different time ranges from a single user. We would like to combine all these time ranges into a single query with logical OR and AND

  WHERE (timestamp >2 and timestamp <3) OR (timestamp >4 and timestamp <5) OR .............. and so on 1000 times

and run them on 13000 tables

Our own tests show that Bigquery has query length limit of 10000 characters?

If we split the conditions into multiple queries we exceed 20,000 daily quota limit.

IS there any work around this, so that we could run these queries without hitting the daily quota limit?

Thanks

JR

like image 223
user1302884 Avatar asked Jun 25 '14 10:06

user1302884


People also ask

What are BigQuery limitations?

A table, query result, or view definition can have up to 10,000 columns. With on-demand pricing, your project can have up to 2,000 concurrent slots. BigQuery slots are shared among all queries in a single project. BigQuery might burst beyond this limit to accelerate your queries.

How do you escape a character in BigQuery?

Backslashes ( \ ) introduce escape sequences.


2 Answers

I faced a similar issue of Big Query SQL Query length limit of 1024K characters when I am passing a big list of the array in WHERE condition.

To resolve it I used a parameterized query. https://cloud.google.com/bigquery/docs/parameterized-queries

like image 58
sid Avatar answered Sep 18 '22 16:09

sid


I can think of two things:

  • Try reducing the number of tables in the dataset. If they share the same schema, would they be able to combined (denormalised) into a table, or at least less number of tables ?

I have loaded 500000+ JSON gzip files into one table, and querying is much easier.

  • With timestamp, you can try to use a shorter common denominator.

for example instead of

WHERE (timestamp > "2014-06-25:00:00:00" AND timestamp < "2014-06-26:00:00:00")

You could express

WHERE LEFT(timestamp,10) = "2014-06-25"

Hopefully this can reduce your character length limit as well.

like image 30
Wan Bachtiar Avatar answered Sep 19 '22 16:09

Wan Bachtiar