Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Redshift Spectrum - how to get the s3 filenames in the external table

I have external tables created in AWS spectrum to query the s3 data however i am not able to identify the filenames which the record belongs to(i have thousands of files under a bucket)

In AWS Athena we have a pseudo column "$PATH" which will display the s3 filenames is there any similar ways available while using spectrum?

like image 632
Rajeev Avatar asked Mar 08 '23 19:03

Rajeev


1 Answers

Since recently, you can use specific pseudo-columns to access the path and the size of the object in S3 for lineage information.

http://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html#c-spectrum-external-tables-pseudocolumns

An example for such a query would be:

>> select distinct "$path", "$size" from spectrum.sales_part;

 $path                                 | $size
---------------------------------------+-------
s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01/ |  1616
s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/ |  1444
s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/ |  1444
like image 105
grundprinzip Avatar answered Apr 19 '23 23:04

grundprinzip