Add filename as column on import to BigQuery?

Question

This is a question about importing data files from Google Cloud Storage to BigQuery.

I have a number of JSON files that follow a strict naming convention to include some key data not included in the JSON data itself.

For example:

xxx_US_20170101.json.gz
xxx_GB_20170101.json.gz
xxx_DE_20170101.json.gz

Which is client_country_date.json.gz At the moment, I have some convoluted processes in a Ruby app that reads the files, appends the additional data and then writes it back to a file that is then imported into a single daily table for the client in BigQuery.

I am wondering if it is possible to grab and parse the filename as part of the import to BigQuery? I could then drop the convoluted Ruby processes which occasionally fail on larger files.

Felipe Hoffa · Accepted Answer

You could define an external table pointing to your files:

enter image description here

Note that the table type is "external table", and that it points to multiple files with the * glob.

Now you can query for all data in these files, and query for the meta-column _FILE_NAME:

#standardSQL
SELECT *, _FILE_NAME filename
FROM `project.dataset.table`

You can now save these results to a new native table.

enter image description here

Add filename as column on import to BigQuery?

Tags:

google-cloud-storage

google-bigquery

Raoot

1 Answers

Felipe Hoffa

Recent Activity

Donate For Us

Add filename as column on import to BigQuery?

Tags:

google-cloud-storage

google-bigquery

Raoot

1 Answers

Felipe Hoffa

Related questions

Recent Activity

Donate For Us