Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to update a value in Google BigQuery in less than 40 seconds?

I have a table in Google BigQuery that I access and modify in Python using the pandas functions read_gbq and to_gbq. The problem is that appending 100,000 lines takes about 150 seconds while appending 1 line takes about 40 seconds. I would like to update a value in the table rather than append a line, is there a way to update a value in the table using python that is very fast, or faster than 40 seconds?

like image 549
user1367204 Avatar asked Jul 10 '17 03:07

user1367204


People also ask

Can you update records in BigQuery?

The BigQuery data manipulation language (DML) enables you to update, insert, and delete data from your BigQuery tables. You can execute DML statements just as you would a SELECT statement, with the following conditions: You must use Google Standard SQL.

How do you overwrite data in BigQuery?

To append to or overwrite a table using query results, specify a destination table and set the write disposition to either: Append to table — Appends the query results to an existing table. Overwrite table — Overwrites an existing table with the same name using the query results.

How do you refresh a dataset in BigQuery?

To update to the latest BigQuery data, at the bottom of the pivot table, click Refresh.

Does BigQuery have quota for update?

By default, BigQuery quotas and limits apply on a per-project basis. Quotas and limits that apply on a different basis are indicated as such; for example, the maximum number of columns per table, or the maximum number of concurrent API requests per user.


1 Answers

Not sure if you can do so using pandas but you sure can using google-cloud library.

You could just install it (pip install --upgrade google-cloud) and run it like:

import uuid
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path_to_json_credentials.json'
from google.cloud.bigquery.client import Client

bq_client = Client()

job_id = str(uuid.uuid4())
query = """UPDATE `dataset.table` SET field_1 = '3' WHERE field_2 = '1'"""
job = bq_client.run_async_query(query=query, job_name=job_id)
job.use_legacy_sql = False
job.begin()

Here this operation is taking 2s on average.

As a side note, it's important to keep in mind the quotas related to DML operations in BQ, that is, know when it's appropriate to use them and if they fit your needs well.

like image 181
Willian Fuks Avatar answered Sep 29 '22 06:09

Willian Fuks