How to connect Jupyter Ipython notebook to Amazon redshift

Tags:

I am using Mac Yosemite. I have installed the packages postgresql, psycopg2, and simplejson using conda install "package name". After the installation I have imported these packages. I tried to create a json file with my amazon redshift credentials

{
    "user_name": "YOUR USER NAME",
    "password": "YOUR PASSWORD",
    "host_name": "YOUR HOST NAME",
    "port_num": "5439",
    "db_name": "YOUR DATABASE NAME"
}

I used with

open("Credentials.json") as fh:
    creds = simplejson.loads(fh.read())

But this is throwing error. These were the instructions given on a website. I tried searching other websites but no site gives a good explanation.

Please let me know the ways I can connect the Jupyter to amazon redshift.

715

asked Aug 13 '16 21:08

SpaceOddity

2 Answers

There's a nice guide from RJMetrics here: "Setting up Your Analytics Stack with Jupyter Notebook & AWS Redshift". It uses ipython-sql

This works great and displays results in a grid.

In [1]:

import sqlalchemy
import psycopg2
import simplejson
%load_ext sql
%config SqlMagic.displaylimit = 10

In [2]:

with open("./my_db.creds") as fh:
    creds = simplejson.loads(fh.read())

connect_to_db = 'postgresql+psycopg2://' + \
                creds['user_name'] + ':' + creds['password'] + '@' + \
                creds['host_name'] + ':' + creds['port_num'] + '/' + creds['db_name'];
%sql $connect_to_db

In [3]:

% sql SELECT * FROM my_table LIMIT 25;

answered Oct 09 '22 03:10

Joe Harris

Here's how I do it:

----INSERT IN CELL 1-----
import psycopg2
redshift_endpoint = "<add your endpoint>"
redshift_user = "<add your user>"
redshift_pass = "<add your password>"
port = <your port>
dbname = "<your db name>"

----INSERT IN CELL 2-----
from sqlalchemy import create_engine
from sqlalchemy import text
engine_string = "postgresql+psycopg2://%s:%s@%s:%d/%s" \
% (redshift_user, redshift_pass, redshift_endpoint, port, dbname)
engine = create_engine(engine_string)

----INSERT IN CELL 3 - THIS EXAMPLE WILL GET ALL TABLES FROM YOUR DATABASE-----
sql = """
select schemaname, tablename from pg_tables order by schemaname, tablename;
"""

----LOAD RESULTS AS TUPLES TO A LIST-----
tables = []
output = engine.execute(sql)
for row in output:
    tables.append(row)
tables

--IF YOU'RE USING PANDAS---
raw_data = pd.read_sql_query(text(sql), engine)

answered Oct 09 '22 05:10

jason_in_la

Related questions
                            
                                How to unload data from Redshift to S3?
                            
                                Redshift as a Web App Backend?
                            
                                AWS Redshift column limit?
                            
                                Amazon Data Pipeline: How to use a script argument in a SqlActivity?
                            
                                Get the auto id for inserted row into Redshift table using psycopg2 in Python
                            
                                Amazon Redshift : Best way to compare dates
                            
                                Will star schema benefit in redshift?
                            
                                How can I ensure synchronous DDL operations on a table that is being replaced?
                            
                                Loading CSV data with NaN into AWS Redshift
                            
                                In Amazon Redshift, how can I bulk insert rows only if they don't already exist?
                            
                                Redshift / Regular Expression (Negative Lookahead) does not work
                            
                                Why can't non-superuser see data in stl_load_errors in Redshift?
                            
                                psycopg2.OperationalError: SSL SYSCALL error: EOF detected
                            
                                Columnar storage: Cassandra vs Redshift
                            
                                Amazon Redshift: Insert data into table from S3 using Java API
                            
                                spark-redshift takes a lot of time to write to redshift
                            
                                Amazon Redshift how to copy from s3 and set a job_id
                            
                                Poor performance on Amazon Redshift queries based on VARCHAR size
                            
                                handling numeric null values using copy command in amazon redshift
                            
                                Does AWS Redshift supports postgis extensions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to connect Jupyter Ipython notebook to Amazon redshift

Tags:

jupyter-notebook

jupyter

amazon-redshift

SpaceOddity

People also ask

2 Answers

Joe Harris

jason_in_la

Recent Activity

Donate For Us