This is a question about importing data files from Google Cloud Storage to BigQuery.
I have a number of JSON files that follow a strict naming convention to include some key data not included in the JSON data itself.
For example:
xxx_US_20170101.json.gz
xxx_GB_20170101.json.gz
xxx_DE_20170101.json.gz
Which is client_country_date.json.gz At the moment, I have some convoluted processes in a Ruby app that reads the files, appends the additional data and then writes it back to a file that is then imported into a single daily table for the client in BigQuery.
I am wondering if it is possible to grab and parse the filename as part of the import to BigQuery? I could then drop the convoluted Ruby processes which occasionally fail on larger files.
You could define an external table pointing to your files:

Note that the table type is "external table", and that it points to multiple files with the * glob.
Now you can query for all data in these files, and query for the meta-column _FILE_NAME:
#standardSQL
SELECT *, _FILE_NAME filename
FROM `project.dataset.table`
You can now save these results to a new native table.

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With