Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add filename as column on import to BigQuery?

This is a question about importing data files from Google Cloud Storage to BigQuery.

I have a number of JSON files that follow a strict naming convention to include some key data not included in the JSON data itself.

For example:

xxx_US_20170101.json.gz
xxx_GB_20170101.json.gz
xxx_DE_20170101.json.gz

Which is client_country_date.json.gz At the moment, I have some convoluted processes in a Ruby app that reads the files, appends the additional data and then writes it back to a file that is then imported into a single daily table for the client in BigQuery.

I am wondering if it is possible to grab and parse the filename as part of the import to BigQuery? I could then drop the convoluted Ruby processes which occasionally fail on larger files.

like image 962
Raoot Avatar asked Oct 26 '25 05:10

Raoot


1 Answers

You could define an external table pointing to your files:

enter image description here

Note that the table type is "external table", and that it points to multiple files with the * glob.

Now you can query for all data in these files, and query for the meta-column _FILE_NAME:

#standardSQL
SELECT *, _FILE_NAME filename
FROM `project.dataset.table` 

You can now save these results to a new native table.

enter image description here

like image 55
Felipe Hoffa Avatar answered Oct 29 '25 07:10

Felipe Hoffa



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!