Retaining source file name while importing data from s3 to Redshift

Question

I have large numbers of files within s3 bucket and usually import it to Redshift. Since number of files is large I need a column in Redshift table which should contain source file name from s3 location.

Is there any means to carried out problem ?

Rishi · Accepted Answer

Agree with Ketan that this is currently not possible in Redshift. If this is what you would want to achieve, it is possible through either

Reading the S3 files programmatically and write a new S3 files with file name as the column and load the new file
Alternatively, use Hive. Create external table on S3 file bucket location and use INPUT__FILE__NAME to get the file names, create a new table and then write back to S3. You can also do some pre-processing in Hive.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns

Hope this helps.

ketan vijayvargiya · Answer

That isn't possible. During a Copy operation, Redshift only loads file contents into a table; it doesn't provide access to S3 file names.

To achieve what you want, you need to preprocess the data to add additional information inside the files.

Retaining source file name while importing data from s3 to Redshift

Tags:

amazon-web-services

amazon-s3

amazon-ec2

amazon-redshift

Pramil Paudel

2 Answers

Rishi

ketan vijayvargiya

Recent Activity

Donate For Us

Retaining source file name while importing data from s3 to Redshift

Tags:

amazon-web-services

amazon-s3

amazon-ec2

amazon-redshift

Pramil Paudel

2 Answers

Rishi

ketan vijayvargiya

Related questions

Recent Activity

Donate For Us