Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Export nested BigQuery data to cloud storage

I am trying to export bigquery data to google cloud storage bucket via the API. I adapted a code snippet from here https://cloud.google.com/bigquery/docs/exporting-data

Job job = table.extract(format, gcsUrl);
// Wait for the job to complete
try {
  Job completedJob = job.waitFor(WaitForOption.checkEvery(1, 
TimeUnit.SECONDS),
      WaitForOption.timeout(3, TimeUnit.MINUTES));
  if (completedJob != null && completedJob.getStatus().getError() == null) {
    // Job completed successfully
  } else {
    // Handle error case
       System.out.println(completedJob.getStatus().getError());
  }
} catch (InterruptedException | TimeoutException e) {
  // Handle interrupted wait

}

I have exchanged format with "JSON" since my data is nested and can't be exported to CSV and the gcsUrl with "gs://mybucket/export_*.json". But the error messages tells me the following problem:

transfer not working  BigQueryError{reason=invalid, location=null, message=Operation cannot be performed on a nested schema. Field: totals}

Any advice what to do? JSON should be able to handle a nested format...

like image 592
flowoo Avatar asked Jul 04 '17 14:07

flowoo


People also ask

How can I export more than 16000 rows in BigQuery?

If your data has more than 16,000 rows you'd need to save the result of your query as a BigQuery Table. Afterwards, export the data from the table into Google Cloud Storage using any of the available options (such as the Cloud Console, API, bq or client libraries).

Does BigQuery use cloud storage?

BigQuery supports querying Cloud Storage data in the following formats: Comma-separated values (CSV) JSON (newline-delimited)


2 Answers

Referring to the destinationFormat option, you should set "NEWLINE_DELIMITED_JSON" for the format variable in order to export as JSON.

like image 159
Elliott Brossard Avatar answered Oct 28 '22 03:10

Elliott Brossard


I know this has been marked as solved but I got the same error while doing it in Python and the extract_table() method in Python doesn't take in the destination_format argument, so for anybody using Python trying to achieve this here is how to export it in JSON format:

# Basically one has to pass job_config instead of destination_format
# Configuring Job Config to export data as JSON
job_config = bigquery.job.ExtractJobConfig()
job_config.destination_format = bigquery.DestinationFormat.NEWLINE_DELIMITED_JSON

extract_job = client.extract_table(
    table_id,
        destination_uri,
        job_config=job_config,
        # Location must match that of the source table.
        location="US"
)

extract_job.result()
like image 38
Henrique Poleselo Avatar answered Oct 28 '22 03:10

Henrique Poleselo