Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Copying specific Columns in Amazon Redshift from S3 databucket

I have a file in S3 with columns like

CustomerID   CustomerName   ProductID    ProductName   Price   Date

Now the existing SQL table structure in Redshift is like

Date  CustomerID   ProductID    Price

Is there a way to copy the selected data into the existing table structure? The S3 database doesn't have any headers, just the data in this order.

like image 350
Bitanshu Das Avatar asked Aug 22 '16 09:08

Bitanshu Das


People also ask

Which Redshift editor command can be used to copy data from S3 into a table?

The syntax to specify the files to be loaded by using a manifest file is as follows: copy <table_name> from 's3://<bucket_name>/<manifest_file>' authorization manifest; The table to be loaded must already exist in the database.

How do I pull data from AWS S3?

In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.


1 Answers

This is for the case where the file has fewer columns than the target load table.

Assuming that CustomerName and ProductName can be NULL fields you have two options.

Option #1 - Load Directly on the table

    COPY main_tablename
    (Date  
    ,CustomerID   
    ,ProductID    
    ,Price)
    FROM 's3://<<YOUR-BUCKET>>/<<YOUR-FILE>>'
    credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret- access-key>';

ANALYZE main_tablename;

Option #2 -- Load the data in a staging table. Then join the staging table with the reference data to insert data into

    COPY staging-tablename
    (Date  
    ,CustomerID   
    ,ProductID    
    ,Price)
    FROM 's3://<<YOUR-BUCKET>>/<<YOUR-FILE>>'
    credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret- access-key>'; 

 INSERT INTO
     main_tablename
SELECT st.CustomerID   
      ,cust.CustomerName   
      ,st.ProductID    
      ,prod.ProductName   
      ,st.Price   
      ,st.Date
FROM  staging-tablename st
INNER JOIN  customer-tablename cust ON ( cust.CustomerID = st.CustomerID)
INNER JOIN  product-tablename prod ON ( prod.ProductID  = st.ProductID );

TRUNCATE TABLE staging-tablename;

ANALYZE main_tablename;
like image 185
BigDataKid Avatar answered Sep 20 '22 16:09

BigDataKid