Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery Pagination

How to do pagination using BigQuery ,when using javascript

First I send request:

var request = gapi.client.bigquery.jobs.query({
    'projectId': project_id,
    'timeoutMs': '30000',
    'query': query,
    'maxResults' : 50,
    'pageToken': pageToken
   });

This query will return me first 50 results and then how can i retrieve next 50 results.I want to do pagination dynamically using javascript and Bigquery.

query:

SELECT year, month,day,state,mother_age, AVG(weight_pounds) as AvgWeight FROM [publicdata:samples.natality] Group EACH By year, month,day,state, mother_age

This is the query that i am using.

like image 578
Innovation Avatar asked Feb 13 '23 15:02

Innovation


2 Answers

TableData.list works, or alternately you can use jobs.getQueryResults(), which is usually the preferred way to get query results (since it can also wait for the query to complete).

You should use the page token returned from the original query response or the previous jobs.getQueryResults() call to iterate through pages. This is generally more efficient and reliable than using index-based pagination.

I don't have a javascript example, but here is an example using python that should be relatively easy to adapt:

from apiclient.discovery import build

def run_query(http, service, project_id, query, response_handler,
              timeout=30*1000, max_results=1024):
  query_request = {
      'query': query,
      'timeoutMs': timeout,
      'maxResults': max_results}
  print 'Running query "%s"' % (query,)
  response = service.jobs().query(projectId=project_id,
      body=query_request).execute(http)
  job_ref = response['jobReference']

  get_results_request = {
      'projectId': project_id,
      'jobId': job_ref['jobId'],
      'timeoutMs': timeout,
      'maxResults': max_results}

  while True:
    print 'Response %s' % (response,)
    page_token = response.get('pageToken', None)
    query_complete = response['jobComplete']
    if query_complete:
      response_handler(response)
      if page_token is None:
        # Our work is done, query is done and there are no more
        # results to read.
        break;
    # Set the page token so that we know where to start reading from.
    get_results_request['pageToken'] = page_token
    # Apply a python trick here to turn the get_results_request dict
    # into method arguments.
    response = service.jobs().getQueryResults(
        **get_results_request).execute(http)

def print_results(results):
  fields = results['schema']['fields']
  rows = results['rows']
  for row in rows:
    for i in xrange(0, len(fields)):
      cell = row['f'][i]
      field = fields[i]
      print "%s: %s " % (field['name'], cell['v']),
    print ''

def run(http, query):
  service = build('bigquery', 'v2')
  project_id = '#Your Project Here#'
  run_query(http, service, project_id, query, print_results,
           timeout=1)
like image 110
Jordan Tigani Avatar answered Feb 27 '23 10:02

Jordan Tigani


Once the query has run, all results will be saved to a temporary table (or permanent, if you have set the respective flag).

You can read these results with tabledata.list. Notice that it offers an startIndex argument, so you can jump to any arbitrary page too, not only to the next one.

https://developers.google.com/bigquery/docs/reference/v2/tabledata/list

like image 23
Felipe Hoffa Avatar answered Feb 27 '23 12:02

Felipe Hoffa