Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use ruby to write individual records to a Redshift database?

Currently, we have a script that parses data and uploads it one record at a time to a mysql database. Recently, we decided to switch to aws redshift.

Is there a way I can use my amazon login credentials and my redshift cluster information to upload these records directly to the redshift database?

All the guides I'm finding online recommend importing text or csv files from an S3 bucket, but that is not very practical for my use case.

Thanks for any help

I'm looking to do something like this:

require 'aws-sdk'
require 'pg'

AWS.config(access_key_id: 'my_access_key_id', secret_access_key: 'my_secret_access_key', region: 'us-west-2')

redshift = AWS::Redshift.new

credentials = {
    driver: "org.postresql.Driver"
    url: "my_connect_url"
    username: "my_username"
    password: "my_password"
    database: "my_db"
}

db = redshift.connect(credentials) # **NOT A REAL LINE OF CODE, I WISH IT WAS**

sql_query = "INSERT INTO my_table (my_column) 
        VALUES ('hello world'); " 

db.query(sql_query)
db.close
like image 922
johncorser Avatar asked Mar 20 '23 05:03

johncorser


1 Answers

Really what you should do here is insert your records one at a time in S3. Then periodically do a load of that file. Redshift is much more efficient at loading a 100,000 line file, then say entering 100 lines of data one by one(rough estimate for my experince...). If you really want to insert stuff record by record you can do this with any standard PSQL connector for ruby. Redshift can be connected to using jdbc/odbc psql drivers. Kinda like the sample program you wrote.

I dont recommend doing this... but here is the doc for insert http://docs.aws.amazon.com/redshift/latest/dg/r_INSERT_30.html

I would check out this question about appending to an s3 file. This is REALLY what you want to do...

Ruby - Append content at the end of the existing s3 file using fog

EDIT So I kinda jumped on that question without reading answer.... So correction, you need to create the file locally, and once it reaches a certain size upload it to s3, then redshift load command.

And here for loading into redshift http://docs.aws.amazon.com/redshift/latest/dg/t_Loading-data-from-S3.html

OR.... you could try this loading data from remote hosts... I have never done this before, but its basically skips the s3 load part, but you still want a large file to move. http://docs.aws.amazon.com/redshift/latest/dg/loading-data-from-remote-hosts.html

And lastly if you really want record by record inserts, you should probably use RDS instead of Redshift, you will get better performance unless you dataset is huge.

Okay this is my try at ruby, but to be honest I have never done RUBY before, but really its just a connection to a PSQL database. You are trying to connect to redshift through AWS SDK, thats used to launch, resize and manage. Connection to redshift for this should be done via JDBC/ODBC driver sqlworkbench, psql linux cli, etc...

require 'pg'
host = 'redshift-xxxx.aws.com'
port = 5439
options = ''
tty = ''
dbname = 'myDB'
login = 'master'
password = 'M@st3rP@ssw0rd'
conn = PGconn.new(host, port, options, tty, dbname, login, password)

Where host, port, dbname, login, and password are all set up during lunch of redshift. DBname is a psql thing, do you know much about psql?

like image 72
Dan Ciborowski - MSFT Avatar answered Apr 22 '23 13:04

Dan Ciborowski - MSFT