Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load multiple files (same schema) into a table in BigQuery?

I have a folder of csv files with the same schema that I want to load into a bigquery table.

Is there an option to give folder path as the input to BQ command to load into bigquery table? I'm interested to know if it can be done without iterating over the files or merging the input files at the source.

like image 824
user1311888 Avatar asked Feb 17 '17 16:02

user1311888


3 Answers

If using cloud storage is an option, you can put them all in a common prefix in a bucket and use a wildcard e.g. gs://my_bucket/some/path/files* to specify a single load job with multiple inputs quickly.

like image 136
shollyman Avatar answered Oct 16 '22 04:10

shollyman


Note that

You can use only one wildcard for objects (filenames) within your bucket. The wildcard can appear inside the object name or at the end of the object name. Appending a wildcard to the bucket name is unsupported.

so something like gs://my_bucket/some/*/files* is not supported.

Source: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage#load-wildcards

like image 42
Sebastian Avatar answered Oct 16 '22 04:10

Sebastian


The files can be in subdirectories, if you want to recursively include all CSV:

bq load --source_format=CSV \
dataset_name.table_name \
"gs://my_bucket/folder/*.csv"

This puts a wildcard on intermediate path and filename. (ex. * expands to subfolder/folder2/filename)

like image 44
itaniumatrix Avatar answered Oct 16 '22 05:10

itaniumatrix