Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Bigquery streaming insertall on app engine & python

I would like to develop an app engine application that directly stream data into a BigQuery table.

According to Google's documentation there is a simple way to stream data into bigquery:

  • http://googlecloudplatform.blogspot.co.il/2013/09/google-bigquery-goes-real-time-with-streaming-inserts-time-based-queries-and-more.html

  • https://developers.google.com/bigquery/streaming-data-into-bigquery#streaminginsertexamples (note: in the above link you should select the python tab and not Java)

Here is the sample code snippet on how streaming insert should be coded:

body = {"rows":[
{"json": {"column_name":7.7,}}
]}

response = bigquery.tabledata().insertAll(
   projectId=PROJECT_ID,
   datasetId=DATASET_ID,
   tableId=TABLE_ID,
   body=body).execute()

Although I've downloaded the client api I didn't find any reference to a "bigquery" module/object referenced in the above Google's example.

Where is the the bigquery object (from snippet) should be located?

Can anyone show a more complete way to use this snippet (with the right imports)?

I've Been searching for that a lot and found documentation confusing and partial.

like image 308
James Avatar asked Feb 26 '14 17:02

James


People also ask

Does BigQuery support streaming inserts?

Streaming is not available through the free tier. If you attempt to use streaming without enabling billing, you receive the following error: BigQuery: Streaming insert is not allowed in the free tier.

What is BigQuery streaming insert?

To do a real-time insert of records into BigQuery, use the streaming insert feature. Loading records from a file to a BigQuery table is free but inserting rows via API is not free. Streaming insert is an inevitable feature if your vision is to implement real-time reporting and analytics.

What is BigQuery streaming buffer?

Streaming buffer: The buffer that retains recently inserted rows, and is optimized for high-throughput writes rather than columnar access. Instant availability reader: Allows the query engine to read records directly from the streaming buffer. Columnar storage: Data associated with a table that's in columnar format.


1 Answers

Minimal working (as long as you fill in the right ids for your project) example:

import httplib2
from apiclient import discovery
from oauth2client import appengine

_SCOPE = 'https://www.googleapis.com/auth/bigquery'

# Change the following 3 values:
PROJECT_ID = 'your_project'
DATASET_ID = 'your_dataset'
TABLE_ID = 'TestTable'


body = {"rows":[
    {"json": {"Col1":7,}}
]}

credentials = appengine.AppAssertionCredentials(scope=_SCOPE)
http = credentials.authorize(httplib2.Http())

bigquery = discovery.build('bigquery', 'v2', http=http)
response = bigquery.tabledata().insertAll(
   projectId=PROJECT_ID,
   datasetId=DATASET_ID,
   tableId=TABLE_ID,
   body=body).execute()

print response

As Jordan says: "Note that this uses the appengine robot to authenticate with BigQuery, so you'll to add the robot account to the ACL of the dataset. Note that if you also want to use the robot to run queries, not just stream, you need the robot to be a member of the project 'team' so that it is authorized to run jobs."

like image 98
Felipe Hoffa Avatar answered Jan 04 '23 10:01

Felipe Hoffa