Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retaining source file name while importing data from s3 to Redshift

I have large numbers of files within s3 bucket and usually import it to Redshift. Since number of files is large I need a column in Redshift table which should contain source file name from s3 location.

Is there any means to carried out problem ?

like image 932
Pramil Paudel Avatar asked Feb 27 '26 02:02

Pramil Paudel


2 Answers

Agree with Ketan that this is currently not possible in Redshift. If this is what you would want to achieve, it is possible through either

  1. Reading the S3 files programmatically and write a new S3 files with file name as the column and load the new file
  2. Alternatively, use Hive. Create external table on S3 file bucket location and use INPUT__FILE__NAME to get the file names, create a new table and then write back to S3. You can also do some pre-processing in Hive.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns

Hope this helps.

like image 193
Rishi Avatar answered Mar 01 '26 19:03

Rishi


That isn't possible. During a Copy operation, Redshift only loads file contents into a table; it doesn't provide access to S3 file names.

To achieve what you want, you need to preprocess the data to add additional information inside the files.

like image 32
ketan vijayvargiya Avatar answered Mar 01 '26 20:03

ketan vijayvargiya



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!