Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery JDBC driver won't return more than 100,000 rows

I am using the starschema JDBC driver for Google BigQuery in Pentaho PDI:

http://code.google.com/p/starschema-bigquery-jdbc/

My query through the BigQuery Web Console returns 129,993 rows, but when I execute the same query through the JDBC driver it only returns 100,000 rows. Is there some kind of option or limit that I am not aware of?

like image 935
user2019523 Avatar asked Jan 22 '26 02:01

user2019523


2 Answers

The StarSchema code looks like it is only returning the first page of results.

The code here here should be updated to get the rest of the results. It should look something like:

public static GetQueryResultsResponse getQueryResults(Bigquery bigquery,
        String projectId, Job completedJob) throws IOException {        
    GetQueryResultsResponse queryResult = bigquery.jobs()
            .getQueryResults(projectId,
                    completedJob.getJobReference().getJobId()).execute();
    while(queryResult.getTotalRows() > queryResult.getRows().size()) {
        queryResult.getRows().addAll(
            bigquery.jobs()
                .getQueryResults(projectId,
                        completedJob.getJobReference().getJobId())
                .setStartIndex(queryResult.getRows().size())
                .execute()
                .getRows());            
    }
    return queryResult;
}
like image 136
Jordan Tigani Avatar answered Jan 24 '26 00:01

Jordan Tigani


Modified the code based on Jordan's answer, the solution looks like this:

    public static GetQueryResultsResponse getQueryResults(Bigquery bigquery,
        String projectId, Job completedJob) throws IOException {
    GetQueryResultsResponse queryResult = bigquery.jobs()
            .getQueryResults(projectId,
                    completedJob.getJobReference().getJobId()).execute();
    long totalRows = queryResult.getTotalRows().longValue();
    if(totalRows == 0){ 
//if we don't have results we'll get a nullPointerException on the queryResult.getRows().size()
        return queryResult;
    }
    while( totalRows  > (long)queryResult.getRows().size() ) {
        queryResult.getRows().addAll(
            bigquery.jobs()
                .getQueryResults(projectId,
                        completedJob.getJobReference().getJobId())
                .setStartIndex(BigInteger.valueOf((long)queryResult.getRows().size()) )
                .execute()
                .getRows());           
    }
    return queryResult;
}

This should solve the problem. Also uploaded the new version to google code, named bqjdbc-1.3.1.jar

like image 23
Balazs Gunics Avatar answered Jan 24 '26 00:01

Balazs Gunics