I was wondering if I could get an insight into how feasible it is to use BigQuery as a primary query engine for an analytics tool that we are developing. Our public API will need to realistically perform at minimum hundreds of concurrent SELECT queries using the PHP SDK (on potentially 100M+ rows), but from the current documentation it seems like BigQuery is more geared towards infrequent querying than providing high volume, high load on demand queries.
Some of the businesses listed on the Google website appear to be doing similar things but I have also seen rate limit figures of 20 concurrent requests, which appears to rule out this Use Case for the product?
I'm glad you asked. Normal BigQuery users are subject to concurrent request rate limits, but there's an option that would suit the exact use case you describe: Reserved capacity.
With reserved capacity, you get your own "separate cluster", not subject to the same limitations, but the ones you define.
Check https://developers.google.com/bigquery/pricing#reserved_cap for more information.
That's an architectural decision. My personal opinion is: I would NOT consider BigQuery if you are expecting several different users to use the API concurrently. That would be expensive and risky. I think you should have the raw data on Big Query and try to figure out a mechanism to serve the clients in a more efficient way, perhaps using cache or saving some results / snapshots on the datastore or perhaps CloudSQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With