Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery - Export query results to local file/Google storage

I want to export query results from BigQuery to local file/Google storage.

I tried 'bq extract' command but it doesn't allow query as input.

Usage: bq extract <source_table> <destination_uris>

I don't want to extract the whole table as the table contains many columns which are not required and I need to aggregate the data.

As of now, the only workaround I could find is to create a table using the 'bq query' command and use the 'bq extract' to extract the data.

I'm looking for any better way to achieve this by doing something like below.

bq extract 'select dept_id,sum(sal) from temp.employee 
                                 group by dept_id' 'gs://XXXX/employee.csv'
like image 244
Munesh Avatar asked May 23 '18 23:05

Munesh


People also ask

How do I export data from BigQuery to Datastore?

Loading Datastore export service data. In the Google Cloud console, go to the BigQuery page. In the Explorer pane, expand your project, and then select a dataset. In the Dataset info section, click add_box Create table.

Where is Google BigQuery data stored?

BigQuery stores table data in columnar format, meaning it stores each column separately. Column-oriented databases are particularly efficient at scanning individual columns over an entire dataset. Column-oriented databases are optimized for analytic workloads that aggregate data over a very large number of records.

How can I export more than 16000 rows in BigQuery?

If your data has more than 16,000 rows you'd need to save the result of your query as a BigQuery Table. Afterwards, export the data from the table into Google Cloud Storage using any of the available options (such as the Cloud Console, API, bq or client libraries).


3 Answers

Direct export from BigQuery Standard SQL was added recently: Exporting data to csv format

EXPORT DATA OPTIONS(
  uri='gs://mybucket/myfolder2/*.csv',
  format='CSV',
  overwrite=true,
  header=true,
  field_delimiter=';') AS
SELECT 1 as field1, 2 as field2 
like image 114
Stanislav Kardashov Avatar answered Oct 09 '22 02:10

Stanislav Kardashov


You can export it using the EXPORT DATA which can enable writing query results directly to GCS, like

EXPORT DATA

[WITH CONNECTION connection_name]

OPTIONS (export_option_list) AS

query_statement

or using the Web UI in just three steps

  1. Configure query to save the results in a BigQuery table and run it.
  2. Export the table to a bucket in GCS.
  3. Download from the bucket.

Step 1

When in BigQuery screen, before running the query go to More > Query Settings

Configure Query

This opens the following

Query Settings

Here you want to have

  • Destination: Set a destination table for query results
  • Project name: select the project.
  • Dataset name: select a dataset. If you don't have one, create it and come back.
  • Table name: give whatever name you want (must contain only letters, numbers, or underscores).
  • Result size: Allow large results (no size limit).

Then Save it and the Query is configured to be saved in a specific table. Now you can run the Query.

Step 2

To export it to GCP you have to go to the table and click EXPORT > Export to GCS.

BigQuery export table

This opens the following screen

Export to GCS

In Select GCS location you define the bucket, the folder and the file.

For instances, you have a bucket named daria_bucket (Use only lowercase letters, numbers, hyphens (-), and underscores (_). Dots (.) may be used to form a valid domain name.) and want to save the file(s) in the root of the bucket with the name test, then you write (in Select GCS location)

daria_bucket/test.csv

If the file is too big (more than 1 GB), you'll get an error. To fix it, you'll have to save it in more files using wildcard. So, you'll need to add *, just like that

daria_bucket/test*.csv

Wildcard export to GCS

This is going to store, inside of the bucket daria_bucket, all the data extracted from the table in more than one file named test000000000000, test000000000001, test000000000002, ... testX.

Step 3

Then go to Storage and you'll see the bucket.

GCS bucket

Go inside of it and you'll find the one (or more) file(s). You can then download from there.

like image 43
Tiago Martins Peres Avatar answered Oct 09 '22 03:10

Tiago Martins Peres


BigQuery does not provide ability to directly export/download query result to GCS or Local File. First you need to get result of query either in explicitly set destination table or if not set you can use temp (anonymous) table that holds query result - you can get it (table) from respective job attribute configuration.query.destinationTable (after job is completed)
Then you can use that table as a source for Export job

like image 31
Mikhail Berlyant Avatar answered Oct 09 '22 04:10

Mikhail Berlyant