Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Redshift: Insert data into table from S3 using Java API

I currently have a file in S3. I would like to issue commands using the Java AWS SDK, to take this data and place it into a RedShift table. If the table does not exist I would like to also create the table. I have been unable to find any clear examples on how to do this so I am wondering if I am going about it the wrong way? Should I be using standard postgres java connectors instead of the AWS SDK?

like image 200
Dan Ciborowski - MSFT Avatar asked Jul 17 '13 14:07

Dan Ciborowski - MSFT


Video Answer


2 Answers

Connect (http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-in-code.html#connecting-in-code-java) and submit your CREATE TABLE and COPY commands

like image 78
Guy Avatar answered Sep 28 '22 04:09

Guy


Guys answer serves most of purpose.

I would like to post a working java JDBC code that does exactly Copy from S3 to Redshift table. I hope it will help others.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
import java.util.Properties;

public class RedShiftJDBC {
    public static void main(String[] args) {

        Connection conn = null;
        Statement statement = null;
        try {
            //Even postgresql driver will work too. You need to make sure to choose postgresql url instead of redshift.
            //Class.forName("org.postgresql.Driver");
            //Make sure to choose appropriate Redshift Jdbc driver and its jar in classpath
            Class.forName("com.amazon.redshift.jdbc42.Driver");
            Properties props = new Properties();
            props.setProperty("user", "username***");
            props.setProperty("password", "password****");

            System.out.println("\n\nconnecting to database...\n\n");
            //In case you are using postgreSQL jdbc driver.
            //conn = DriverManager.getConnection("jdbc:postgresql://********8-your-to-redshift.redshift.amazonaws.com:5439/example-database", props);

            conn = DriverManager.getConnection("jdbc:redshift://********url-to-redshift.redshift.amazonaws.com:5439/example-database", props);

            System.out.println("\n\nConnection made!\n\n");

            statement = conn.createStatement();

            String command = "COPY my_table from 's3://path/to/csv/example.csv' CREDENTIALS 'aws_access_key_id=******;aws_secret_access_key=********' CSV DELIMITER ',' ignoreheader 1";

            System.out.println("\n\nExecuting...\n\n");

            statement.executeUpdate(command);
            //you must need to commit, if you realy want to have data saved, otherwise it will not appear if you query from other session.
            conn.commit();
            System.out.println("\n\nThats all copy using simple JDBC.\n\n");
            statement.close();
            conn.close();
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}
like image 27
Red Boy Avatar answered Sep 28 '22 03:09

Red Boy