I have external tables created in AWS spectrum to query the s3 data however i am not able to identify the filenames which the record belongs to(i have thousands of files under a bucket)
In AWS Athena we have a pseudo column "$PATH" which will display the s3 filenames is there any similar ways available while using spectrum?
Since recently, you can use specific pseudo-columns to access the path and the size of the object in S3 for lineage information.
http://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html#c-spectrum-external-tables-pseudocolumns
An example for such a query would be:
>> select distinct "$path", "$size" from spectrum.sales_part;
$path | $size
---------------------------------------+-------
s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01/ | 1616
s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/ | 1444
s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/ | 1444
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With