Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bq.py Not Paging Results

We're working on writing a wrapper for bq.py and are having some problems with result sets larger than 100k rows. It seems like in the past this has worked fine (we had related problems with Google BigQuery Incomplete Query Replies on Odd Attempts). Perhaps I'm not understanding the limits explained on the doc page?

For instance:

#!/bin/bash

for i in `seq 99999 100002`;
do
    bq query -q --nouse_cache --max_rows 99999999 "SELECT id, FROM [publicdata:samples.wikipedia] LIMIT $i" > $i.txt
    j=$(cat $i.txt | wc -l)
    echo "Limit $i Returned $j Rows"
done

Yields (note there are 4 lines of formatting):

Limit 99999 Returned   100003 Rows
Limit 100000 Returned   100004 Rows
Limit 100001 Returned   100004 Rows
Limit 100002 Returned   100004 Rows

In our wrapper, we directly access the API:

while row_count < total_rows:
    data = client.apiclient.tabledata().list(maxResults=total_rows - row_count,
                                                 pageToken=page_token,
                                                 **table_dict).execute()

    # If there are more results than will fit on a page, 
    # you will recieve a token for the next page
    page_token = data.get('pageToken', None)

    # How many rows are there across all pages?
    total_rows = min(total_rows, int(data['totalRows'])) # Changed to use get(data[rows],0)
    raw_page = data.get('rows', [])

We would expect to get a token in this case, but none is returned.

like image 719
Jacob Schaer Avatar asked Nov 01 '22 13:11

Jacob Schaer


1 Answers

sorry it took me a little while to get back to you.

I was able to identify a bug that exists server-side, you would end up seeing this with the Java client as well as the python client. We're planning on pushing a fix out this coming week. Your client should start to behave correctly as soon as that happens.

BTW, I'm not sure if you knew this already or not but there's a whole standalone python client that you can use to access the API from python as well. I thought that might be a bit more convenient for you than the client that's distributed as part of bq.py. You'll find a link to it on this page: https://developers.google.com/bigquery/client-libraries

like image 138
Eric Avatar answered Nov 04 '22 06:11

Eric