Looking to implement a simple datastore for a departmental team where they currently manage a load of excel/csv files. We will get them to prepare the files and drop them in CSV format into a GCS bucket and then point an external BQ table at this (that all works great).
However, if they run a query and see some data and then want to find where that data has actually been pulled from, how can we find out (assuming there is no contextual clues in the filename) which file contains the row(s) in question?
You can use the _FILE_NAME pseudo column to look at the file to which a row belongs for external tables. Note that the pseudo column only works for external tables. Example:
bq query --external_table_definition=externalTable::AVRO=gs://mybucket/f* 'SELECT _FILE_NAME as f FROM externalTable'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With