First let me explain the problem. I have 500 unique users. The data from each of these users is split into smaller gzip files(lets say on an average 25 files per user). We have loaded each split gzip file as a separate table in BiqQuery. Therefore, our dataset has 13000 something tables in it.
Now, We have to run time range queries to retrieve some data from each user. We have around 500-1000 different time ranges from a single user. We would like to combine all these time ranges into a single query with logical OR and AND
WHERE (timestamp >2 and timestamp <3) OR (timestamp >4 and timestamp <5) OR .............. and so on 1000 times
and run them on 13000 tables
Our own tests show that Bigquery has query length limit of 10000 characters?
If we split the conditions into multiple queries we exceed 20,000 daily quota limit.
IS there any work around this, so that we could run these queries without hitting the daily quota limit?
Thanks
JR
A table, query result, or view definition can have up to 10,000 columns. With on-demand pricing, your project can have up to 2,000 concurrent slots. BigQuery slots are shared among all queries in a single project. BigQuery might burst beyond this limit to accelerate your queries.
Backslashes ( \ ) introduce escape sequences.
I faced a similar issue of Big Query SQL Query length limit of 1024K characters when I am passing a big list of the array in WHERE condition.
To resolve it I used a parameterized query. https://cloud.google.com/bigquery/docs/parameterized-queries
I can think of two things:
I have loaded 500000+ JSON gzip files into one table, and querying is much easier.
for example instead of
WHERE (timestamp > "2014-06-25:00:00:00" AND timestamp < "2014-06-26:00:00:00")
You could express
WHERE LEFT(timestamp,10) = "2014-06-25"
Hopefully this can reduce your character length limit as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With