Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download all data in a Google BigQuery dataset?

Is there an easy way to directly download all the data contained in a certain dataset on Google BigQuery? I'm actually downloading "as csv", making one query after another, but it doesn't allow me to get more than 15k rows, and rows i need to download are over 5M. Thank you

like image 415
mark Avatar asked Aug 28 '13 16:08

mark


People also ask

How do I export data from BigQuery to Datastore?

Loading Datastore export service data. In the Google Cloud console, go to the BigQuery page. In the Explorer pane, expand your project, and then select a dataset. In the Dataset info section, click add_box Create table.

How do I copy a BigQuery dataset from one project to another?

In the Google Cloud console, go to the BigQuery page. Click Data transfers. Select a transfer for which you want to view the transfer details. On the Transfer details page, select a transfer run.

What is BigQuery export?

BigQuery is a cloud data warehouse that lets you run highly performant queries of large datasets. You can export all of your raw events from Google Analytics 4 properties to BigQuery, and then use an SQL-like syntax to query that data.


2 Answers

You can run BigQuery extraction jobs using the Web UI, the command line tool, or the BigQuery API. The data can be extracted

For example, using the command line tool:

First install and auth using these instructions: https://developers.google.com/bigquery/bq-command-line-tool-quickstart

Then make sure you have an available Google Cloud Storage bucket (see Google Cloud Console for this purpose).

Then, run the following command:

bq extract my_dataset.my_table gs://mybucket/myfilename.csv

More on extracting data via API here: https://developers.google.com/bigquery/exporting-data-from-bigquery

like image 148
Michael Manoochehri Avatar answered Oct 31 '22 04:10

Michael Manoochehri


Detailed step-by-step to download large query output

  1. enable billing

    You have to give your credit card number to Google to export the output, and you might have to pay.

    But the free quota (1TB of processed data) should suffice for many hobby projects.

  2. create a project

  3. associate billing to a project

  4. do your query

  5. create a new dataset

  6. click "Show options" and enable "Allow Large Results" if the output is very large

  7. export the query result to a table in the dataset

  8. create a bucket on Cloud Storage.

  9. export the table to the created bucked on Cloud Storage.

    • make sure to click GZIP compression

    • use a name like <bucket>/prefix.gz.

      If the output is very large, the file name must have an asterisk * and the output will be split into multiple files.

  10. download the table from cloud storage to your computer.

    Does not seem possible to download multiple files from the web interface if the large file got split up, but you could install gsutil and run:

    gsutil -m cp -r 'gs://<bucket>/prefix_*' .
    

    See also: Download files and folders from Google Storage bucket to a local folder

    There is a gsutil in Ubuntu 16.04 but it is an unrelated package.

    You must install and setup as documented at: https://cloud.google.com/storage/docs/gsutil

  11. unzip locally:

    for f in *.gz; do gunzip "$f"; done
    

Here is a sample project I needed this for which motivated this answer.