Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bad Request error while querying data from bigquery in a loop

I am querying data from bigquery using get_data_from_bq method mentioned below in a loop:

def get_data_from_bq(product_ids):
    format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
    query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
    query_job = bigquery_client.query(query, job_config=job_config)
    return query_job.result()

While for the first query(iteration) data returned is correct, all the subsequent queries are throwing the below-mentioned exception

    results = query_job.result()
  File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2415, in result
    super(QueryJob, self).result(timeout=timeout)
  File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 660, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/home/ishank/.local/lib/python2.7/site-packages/google/api_core/future/polling.py", line 120, in result
    raise self._exception
google.api_core.exceptions.BadRequest: 400 Cannot explicitly modify anonymous table xyz:_bf4dfedaed165b3ee62d8a9efa.anon1db6c519_b4ff_dbc67c17659f

Edit 1: Below is a sample query which is throwing the above exception. Also, this is running smoothly in bigquery console.

select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in ("168561","175936","161684","161681","161686") and eventTime > CAST("2018-05-30 11:21:19" as DATETIME) group by eventType, productId order by productId;
like image 603
Ishank Gulati Avatar asked Jun 13 '18 13:06

Ishank Gulati


People also ask

How do you handle errors in BigQuery?

BigQuery hasn't documented it yet, but you can handle any type of exception in BigQuery by creating an exception handling clause, as described in the following example: BEGIN SELECT 1/0; EXCEPTION WHEN ERROR THEN SELECT @@error. message, @@error.

How do you use loops in BigQuery?

Example to use BigQuery CONTINUE and BREAKDECLARE x INT64 DEFAULT 0; LOOP SET x = x + 1; IF x >= 10 THEN CONTINUE; END IF; BREAK; END LOOP; SELECT x; Note that, ITERATE is a synonym for CONTINUE and LEAVE is a synonym for BREAK . You can use them interchangeably.

How do you declare a variable in a large query?

BigQuery also supports system variables. You do not need to declare system variables, but you can set any of them that are not marked read-only. You can reference system variables in queries. The following example initializes the variable x as an INT64 with the value NULL .


2 Answers

I had the exact same issue. The problem is not the query itself, it's that you are most likely reusing the same QueryJobConfig. When you perform a query, unless you set a destination, BigQuery stores the result in an anonymous table which is stated in the QueryJobConfig object. If you reuse this configuration, BigQuery tries to store the new result in the same anonymous table, hence the error. I don't particularly like this behaviour, to be honest.

You should rewrite your code like that:

def get_data_from_bq(product_ids):
    format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
    query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
    query_job = bigquery_client.query(query, job_config=QueryJobConfig())
    return query_job.result()

Hope this helps!

like image 100
Federico Bertola Avatar answered Oct 16 '22 08:10

Federico Bertola


Edited:

Federico Bertola is correct on the solution and the temporary table that is written to by BigQuery see this link.

I did not get an error with my sample code querying from a public table last time, but I can reproduce the error today, so it is possible this symptom can appear intermittent. I can confirm the error is resolved with Federico’s suggestion.


You can get the “super(QueryJob, self).result(timeout=timeout)” error when the query string lacks quotes around the parameters in the query. It seems you have made a similar mistake with the parameter format_strings in your query. You can fix this problem by ensuring there is quotes escaped around the parameter:

(" + myparam + ")

, should be written as

(\"" + myparam + "\")

You should examine your query string where you use parameters, and start with a simpler query such as

select productId, eventType, count(*) as count from `xyz:xyz.abc`

, and grow your query as you go.

For the record, here is what worked for me:

from google.cloud import bigquery
client = bigquery.Client()
job_config = bigquery.QueryJobConfig()

def get_data_from_bq(myparam):  
    query = "SELECT word, SUM(word_count) as count FROM `publicdata.samples.shakespeare` WHERE word IN (\""+myparam+"\") GROUP BY word;"
    query_job = client.query(query, job_config=job_config) 
    return query_job.result()

mypar = "raisin"
x = 1
while (x<9):
    iterator = get_data_from_bq(mypar)
    print "==%d iteration==" % x
    x += 1
like image 20
Kevin Chien Avatar answered Oct 16 '22 08:10

Kevin Chien